Shortcuts: WD:RFBOT, WD:BRFA, WD:RFP/BOT

Wikidata:Requests for permissions/Bot

From Wikidata
Jump to navigation Jump to search


Wikidata:Requests for permissions/Bot
To request a bot flag, or approval for a new task, in accordance with the bot approval process, please input your bot's name into the box below, followed by the task number if your bot is already approved for other tasks. Then transclude that page onto this page, like this: {{Wikidata:Requests for permissions/Bot/RobotName}}.

Old requests go to the archive.

Once consensus is obtained in favor of granting the botflag, please post requests at the bureaucrats' noticeboard.

Translate this header box!


Bot Name Request created Last editor Last edited
AmmarpadBot 2021-06-10, 13:55:19 Mike Peel 2021-06-11, 08:26:13
VolleyballBot 2021-06-08, 23:12:12 BrokenSegue 2021-06-10, 01:16:53
Adecco-wiki-bot 2021-05-25, 11:19:49 Ymblanter 2021-06-11, 18:04:45
SBBUpdaterBot 2021-06-03, 14:31:56 SBBUpdaterBot 2021-06-10, 11:27:52
BorkedBot 5 2021-05-31, 07:01:29 Epìdosis 2021-05-31, 09:43:17
Tmdbzhbot 2021-05-25, 22:47:06 Ymblanter 2021-06-03, 18:49:51
SmartifyBot 2021-05-18, 16:59:28 Bovlb 2021-06-02, 19:52:24
Josh404Bot 3 2021-05-16, 19:01:10 Josh404 2021-05-17, 01:18:23
MsynBot 7 2021-05-07, 17:27:45 Lymantria 2021-05-08, 07:27:01
Josh404Bot 2 2021-05-04, 00:59:19 Lymantria 2021-05-11, 05:10:15
Sailbot 2021-04-28, 05:06:34 Lymantria 2021-05-08, 07:26:05
METbot 2021-04-28, 02:07:37 Jura1 2021-05-16, 11:44:19
Mbchbot 2 2021-04-19, 19:49:18 Mbch331 2021-04-28, 19:43:40
EdwardAlexanderCrowley (flood) 2021-04-20, 13:32:39 EdwardAlexanderCrowley 2021-04-26, 05:56:15
Njzjzbot 2021-04-18, 07:34:31 Ymblanter 2021-04-28, 19:23:02
InforegisterIDupdater 2021-04-07, 06:28:39 BrokenSegue 2021-04-16, 13:49:22
ComplexPortalBot 2021-02-23, 17:06:42 Jura1 2021-05-31, 10:38:20
FerdiBot 2021-04-08, 15:44:31 Ymblanter 2021-04-15, 19:18:43
So9qBot 2021-04-05, 13:59:04 GZWDer 2021-05-05, 02:50:44
Josh404Bot 1 2021-04-08, 01:28:27 Lymantria 2021-04-19, 05:16:12
NikkiBot 4 2021-04-01, 16:17:27 Lymantria 2021-04-11, 17:50:47
AVSBot 2021-03-29, 18:06:59 Avsolov 2021-04-04, 18:13:45
Pi bot 21 2021-03-24, 21:30:52 Jura1 2021-04-27, 17:15:25
Pi bot 20 2021-03-24, 19:38:36 Ymblanter 2021-03-26, 22:37:51
WikiportraitBot 2021-03-18, 20:48:22 Ymblanter 2021-03-22, 19:38:15
ZabesBot 2 2021-03-14, 15:59:22 Zabe 2021-04-15, 10:14:18
QuebecLiteratureBot 2021-02-27, 01:05:19 Multichill 2021-04-25, 10:48:29
Lockalbot 1 2021-02-10, 18:52:05 Ymblanter 2021-02-13, 21:21:07
taxonbot 2021-02-08, 20:27:35 Lymantria 2021-03-24, 06:19:04
Openaccess cma 2021-02-05, 15:24:35 Multichill 2021-04-25, 10:53:43
MarcoBotURW 2021-03-11, 18:56:21 MAstranisci 2021-03-13, 09:58:32
ZabesBot 1 2021-01-26, 14:10:36 Lymantria 2021-02-07, 09:37:00
NicereddyBot 6 2021-01-23, 05:25:25 Multichill 2021-04-25, 10:52:49
BorkedBot 4 2021-01-22, 21:52:55 Lymantria 2021-03-10, 06:17:11
BubblySnowBot 2021-01-22, 07:35:45 Ymblanter 2021-02-13, 21:28:14
GZWDer (flood) 6 2021-01-12, 20:03:14 GZWDer 2021-02-09, 12:46:53
DutchElectionsBot 2020-12-28, 15:14:47 Lymantria 2021-04-11, 17:52:22
JarBot 5 2020-12-19, 05:40:32 Jura1 2021-03-08, 09:13:38
Cewbot 4 2020-11-27, 03:49:43 Kanashimi 2020-12-04, 06:09:53
Datatourismebot 2020-11-23, 23:14:16 Conjecto 2020-11-23, 23:14:16
Fab1canBot 2020-11-01, 14:00:02 Fab1can 2020-11-01, 14:00:02
BorkedBot 3 2020-10-29, 02:49:29 Lymantria 2021-02-06, 10:36:48
romedi 1 2020-10-24, 13:52:40 Lymantria 2020-11-01, 09:55:26
FischBot 8 2020-10-05, 23:40:58 Envlh 2021-05-19, 11:50:35
RegularBot 3 2020-09-09, 01:09:09 Ladsgroup 2020-10-10, 18:52:50
RegularBot 2 2020-08-08, 07:28:57 Mike Peel 2021-01-03, 19:28:53
RegularBot 2020-08-04, 13:25:12 Jura1 2020-08-15, 17:02:52
Orcbot 2020-07-30, 14:17:20 EvaSeidlmayer 2021-01-14, 20:30:18
OpenCitations Bot 2020-07-29, 13:23:50 Diegodlh 2021-02-15, 22:05:49
TwPoliticiansBot 2020-07-12, 14:31:33 TwPoliticiansBot 2020-07-12, 14:31:33
T cleanup bot 2020-06-21, 17:39:23 Jura1 2020-12-07, 21:23:34
OlafJanssenBot 2020-06-11, 21:45:35 Lymantria 2020-06-26, 08:07:22
Recipe Bot 2020-05-20, 14:21:59 Haansn08 2020-09-27, 09:40:31
LouisLimnavongBot 2020-05-14, 13:09:17 Hazard-SJ 2020-11-03, 06:51:31
BsivkoBot 3 2020-05-08, 13:25:37 Bsivko 2020-05-08, 13:28:25
BsivkoBot 2 2020-05-08, 12:50:25 Jura1 2020-05-19, 10:37:06
DeepsagedBot 1 2020-04-14, 06:16:52 Pamputt 2020-08-03, 18:35:01
Uzielbot 2 2020-04-07, 23:49:11 Jura1 2020-05-16, 13:23:42
WordnetImageBot 2020-03-18, 12:17:03 DannyS712 2020-07-07, 12:03:42
Lamchuhan-hcbot 2020-03-24, 08:06:07 GZWDer 2020-03-24, 08:06:07
GZWDer (flood) 3 2018-07-23, 23:08:28 1234qwer1234qwer4 2021-01-25, 14:02:11
MusiBot 2020-02-28, 01:01:19 Premeditated 2020-03-18, 09:43:03
AitalDisem 2020-01-14, 15:48:04 Hazard-SJ 2020-10-07, 06:10:56
BsivkoBot 2019-12-28, 19:38:23 Bsivko 2020-05-08, 12:35:10
Antoine2711bot 2019-07-02, 04:25:58 MisterSynergy 2020-10-29, 21:32:21

AmmarpadBot[edit]

AmmarpadBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Ammarpad (talkcontribslogs)

Task/s: Replace stated as (P1932) qualifier with named as (P1810) for GND ID (P227) statement.

Code: update_gnd_id_qualifiers.py

Function details:

Requested at Wikidata:Bot_requests#request_to_replace_qualifiers_in_GND_ID_(2021-06-07)

--Ammarpad (talk) 13:54, 10 June 2021 (UTC)

  •  Support I've been working with Ammarpad through an Outreachy intership, while this task isn't directly related to that, their coding skills are good and they will make a good bot operator here. Thanks. Mike Peel (talk) 08:26, 11 June 2021 (UTC)

VolleyballBot[edit]

VolleyballBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operators:

We just need to approve User:VolleyballBot for the same needs of the already-approved User:Valerio Bozzolan bot. Precise RFP: Wikidata:Requests for permissions/Bot/Valerio Bozzolan bot 2.

With User:CristianNX we decided it was better to have a dedicated account for volley-related bot activity.

Thank you so much for your time! I'm available for comments. --Valerio Bozzolan (talk) 23:12, 8 June 2021 (UTC)

Adecco-wiki-bot[edit]

Adecco-wiki-bot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: 90.221.178.12 (talkcontribslogs)

Task/s:

Code:

Function details: --90.221.178.12 11:19, 25 May 2021 (UTC)

  •  Oppose IPs cannot be operators and there's no details. BrokenSegue (talk) 01:14, 10 June 2021 (UTC)
And, until there is an operator with a credible account, nobody would consider this request seriously.--Ymblanter (talk) 18:04, 11 June 2021 (UTC)

SBBUpdaterBot[edit]

SBBUpdaterBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: DFB98 (talkcontribslogs)

Task/s: Update Open Days on Wikidata for SBB train stations

Code:

Function details: The Bot uses data from data.sbb.ch to update Wikidata entries of SBB train stations. --SBBUpdaterBot (talk) 14:31, 3 June 2021 (UTC)

  • I added myself as the operator of the bot. DFB98 (talk) 17:23, 3 June 2021 (UTC)
  • How many edits will this bot make and how often? The samples seem fine but I'm a little worried about giving the bot flag to someone with zero edit history across all wikis. BrokenSegue (talk) 20:55, 3 June 2021 (UTC)
+ how frequently do these change? I'm not sure if they would be that interesting to have if they quickly get outdated and wont be kept up-to-date. --- Jura 18:11, 9 June 2021 (UTC)
If you add new statements, please use more explicit edit summaries like "adding" or "creating" instead of "updating", e.g. "adding [[Property:P3025]] with [[Property:P8626]] and [[Property:P8627]]".
If you update items with new times, please create a new statement with preferred rank (do not remove/delete/overwrite existing statements).
Are you affiliated with SBB? --- Jura 18:11, 9 June 2021 (UTC)
  • Thanks for the hints. I will adjust the bot accordingly. I found out why I accidentaly added 9 AM (Q41618181) instead of 09:00 (Q55811413), and fixed the problem in the bot. Yes, I am affiliated with SBB. The bot is part of a project for university, but when the bot is done, the bot will be handed over to SBB and they are going to be operating the bot. I am not sure if they are going to run the bot daily, but it is going to run at least weekly. SBBUpdaterBot (talk) 11:27, 10 June 2021 (UTC).

BorkedBot 5[edit]

BorkedBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: BrokenSegue (talkcontribslogs)

Task/s: Find dates to uprank based on their being more precise than the other dates.

Code: See the "prefer_dates" folder in the github repo (linked from the bot's user page)

Function details:

The bot

  • Finds items with multiple dates for a single property (currently just date of birth (P569) but I want to expand to all popular date properties)
  • Find the most precise date that has references (excluding references back to a Wikipedia)
  • Ensure all other dates are compatible with that date (use the same calendar, equal in the relevant date pieces)
  • Up rank the best date and add reason for preferred rank (P7452) most precise value (Q71536040)

For now we are excluding dates that have qualifiers as those can complicate the situation.

I made a bunch of test edits but they might be hard to find in the bot's history. Here are some: [2] [3] [4] [5] [6]

This is in response to the request at Wikidata:Bot_requests#request_to_automate_marking_preferred_rank_for_full_dates._(2021-05-28) which gives a rationale for the task.

--BrokenSegue (talk) 07:01, 31 May 2021 (UTC)

  • Sounds good. Personally, I wouldn't add the qualifier. If you do add it, maybe the value should mentioned exactly what the bot did to select (most precise value + reference + no incompatible years). --- Jura 07:14, 31 May 2021 (UTC)
  •  Support everything sounds good; if you add the qualifier reason for preferred rank (P7452) most precise value (Q71536040), maybe you can try adding it also to cases where the best date has already been upranked (for uniformity). --Epìdosis 09:43, 31 May 2021 (UTC)

SmartifyBot[edit]

SmartifyBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)

Operator: Rob Lowe - Smartify (talkcontribslogs)

Task/s: I am collaborating with the Yale Center for British Art to upload all their public domain artworks (not just paintings) to Wikidata and Commons. There are approximately 42,000 works to load.

Code: Uses pywikibot and is based on artdatabot.py and wikidata_uploader.py by Multichill (talkcontribslogs). The code is rather specific to YCBA art present, but I will attempt to generalise and release.

Function details: See code mentioned above. But in more detail the code:

  • Creates a new Wikidata item for the the artwork ...
  • Adds claims for:
    • instance of
    • image
    • inception
    • location
    • title
    • creator
    • made from material
    • collection
    • inventory number
    • width
    • height
    • depth
    • copyright license
    • described at URL
  • Creates a linked Wikimedia Commons {{Artwork}} item

All the Commons data is derived from the Wikidata item except for the medium. YCBA often have very lengthy medium descriptions that are not easily. expressed in Wikidata, e.g. Aquatint and etching on medium, slightly textured, cream wove paper. The full text is used in Commons while a subset of the terms is used for the Wikidata item.

Although I had requested and received a bot flag for Commons I had not done so for Wikidata (sincere apologies for that), but I had already uploaded approx. 6000 items before that was noticed!

An example of an upload may best demonstrate what has been done, here in Wikidata:

Fashionable Bores, or Coolers in High Life (Q106837635)

and here in Commons:

Fashionable Bores, or Coolers in High Life

More information about the project and in particular the use of CC0 for the copyright licence can be found on the request page for the SmartifyBot in Commons.


--Rob Lowe - Smartify (talk) 16:59, 18 May 2021 (UTC)

Hi BrokenSegue, thanks for your comments. I’ll address the simpler things first:
- I’ve changed the code to add the necessary qualifiers to the P973 statements. In my defence, except for the Mona Lisa and a few other famous works it’s had to find a use of described at URL (P973) that doesn’t have a warning triangle by it.
- I’ve changed the code to make use of JPEG File Interchange Format (Q26329975) rather than JPEG (Q2195) to describe the JPEG files. But in any event these statements are transitory. The bot runs in two phases, it adds the artwork with Commons compatible image available at URL (P4765) and later loads the image to Commons, removes the P4765 and creates the image (P18) link to Commons.
- The far more serious issue is the one of duplicates. I thought I had taken a lot of steps to avoid duplicates, but it seems there is a problem. It’s worthwhile commenting on some history. In 2012 Google Arts and Culture added about 5000 images to Commons. Wikidata items were added for about 2100 of them in 2016. A few more Commons items have been added since then. To make sure I didn’t add duplicates to Commons I identified all the Yale works in both Wikidata and Commons and edited at least 150 manually to add accession number details so they could be identified reliably.
The Smartify bot starts by using a sparql query to discover the existing works and avoid the them. As a newcomer to Wikidata bots I wonder if I have made a rookie error in operating the bot. After adding a bunch of Wikidata items, how quickly are they visible to a subsequent sparql query? I think it is likely that I’ve stopped the bot and restarted it and the recently added records have not been discovered by the query and so duplicates have been produced. I’ll definitely tighten this up.
Unfortunately about 400 duplicates have been produced. What is the best way to get them removed? Provide a list? A SparQL query that list them should also be possible. Rob Lowe - Smartify (talk) 13:46, 21 May 2021 (UTC)
Duplicate items should be merged. You can do this manually or with a bot. I don't know of a tool that takes a list and does it automatically though. BrokenSegue (talk) 17:26, 21 May 2021 (UTC)
I can understand a merge might be appropriate where the items have come from two different sources and you want to capture unique information from both. But here the duplicate has exactly the same author and information, created twice in error. Is deletion not better in this instance? I suppose you could overwrite the item in its entirety with information about a new artwork - the history would look a bit peculiar though. Rob Lowe - Smartify (talk) 17:28, 23 May 2021 (UTC)
You can request bulk deletion at WD:RFD but it's probably better just to merge them. We simply don't know if any external entities picked up the Wikidata item ids, no matter how short their existence, and a merge allows them to be corrected. Bovlb (talk) 23:45, 25 May 2021 (UTC)
@ Bovlb: I've requested the deletion of 297 items, because I believe that's the correct way to proceed in this instance, my reasoning is on the WD:RFD page. But the request doesn't seem to have been actioned, nor has it been categorically rejected. So I'm not sure what to do... Rob Lowe - Smartify (talk) 18:00, 2 June 2021 (UTC)
@Rob Lowe - Smartify: I replied again there to explain in more detail why I recommend merging and oppose deletion in this case. Cheers, Bovlb (talk) 19:52, 2 June 2021 (UTC)
  • As I got tagged on this one: Can you please publish your code? Multichill (talk) 19:10, 21 May 2021 (UTC)
I will sort something out. Rob Lowe - Smartify (talk) 17:28, 23 May 2021 (UTC)
Hi @ Multichill: sorry for the delay, other projects intervened. The source code is here. It hooks into the Smartify database to get the artworks - I haven't provided that code, but I hope it is fairly clear what is going on. It makes use of a modified version of your artdatabot.py code. I've extended it in a few places which I've marked with my initials, RML. The changes are to:
- allow other types of artwork, not just paintings and multiple instance of (P31) statements, so something can be an etching but also a print
- allow dates of the form 'after 1856'. I will probably need to add more date handling in due course.
- allow multiple made from material (P186) statements, not just oil on canvas
- allow multiple described at URL (P973) statements
With regard to the duplicates mentioned above, you can see in smartifybot.py that it gets a list of all the existing works in Wikidata using a sparql query and avoids them; except it seems, not reliably. I was uploading works in batches of a few 100 at a time, and all I can think is that the sparql query is not returning recently added items so if I immediately started a second batch the existing works were not detected and a duplicate inserted. Could you advise on this, is it a possible scenario? Rob Lowe - Smartify (talk) 18:00, 2 June 2021 (UTC)

Josh404Bot 3[edit]

Josh404Bot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Josh404 (talkcontribslogs)

Task/s:

Fill missing TMDb TV series ID (P4983) that have an associated IMDb ID (P345) or TheTVDB.com series ID (P4835) via the TMDb API.

Code:

https://github.com/josh/wikidatabots/compare/d2dec28...4cd7cea

SELECT ?item ?imdb ?tvdb ?random WHERE {
  # Items with either IMDb or TVDB IDs
  { ?item wdt:P4835 []. }
  UNION
  { ?item wdt:P345 []. }

  # P4983's type constraint
  VALUES ?classes {
    wd:Q15416
  }
  ?item (wdt:P31/(wdt:P279*)) ?classes.

  # Get IMDb and TVDB IDs
  OPTIONAL { ?item wdt:P345 ?imdb. }
  OPTIONAL { ?item wdt:P4835 ?tvdb. }

  # Exclude items that already have a TMDB TV ID
  OPTIONAL { ?item wdt:P4985 ?tmdb. }
  FILTER(!(BOUND(?tmdb)))

  # Generate random sorting key
  BIND(MD5(CONCAT(STR(?item), STR(RAND()))) AS ?random)
}
ORDER BY ?random
LIMIT 1000

Try it!

Function details:

This is a follow-up to Wikidata:Requests for permissions/Bot/Josh404Bot 2 and Wikidata:Requests for permissions/Bot/Josh404Bot 1. The task operates on TMDb related external IDs and shares similar code. One larger difference is that this bot task also cross-references TheTVDB.com series ID (P4835) in addition to IMDb ID (P345).

  1. Via SPARQL, find items that have either a IMDb ID (P345) or TheTVDB.com series ID (P4835) but DO NOT have a TMDb TV series ID (P4983). Accumulate results and remove duplicates.
  2. Use TMDb API's to lookup the TV show ID by either IMDB or TVDB id when present.
  3. For any matches, add a new statements for the item. This MAY add multiple distinct statements for a given item if the IMDB and TVDB IDs conflict with either other or when multiple IMDb IDs exist on a single item.

Recapping some notes that came up in previous reviews:

  • SPARQL results are accumulated in a client side in a Python dictionary to better handle items with multiple IMDb IDs or TVDB IDs.
  • TMDB API rejects invalid IDs on lookup. Handles theoretical case of a "nm" IMDB ID present on a tv Wikidata item.
  • TMDB TV IDs are accumulated in a set to remove duplicates before generating statements.
  • Statements are submitted in batch via QuickStatements which also acts a failsafe for preventing duplicate statements.

--Josh404 (talk) 19:01, 16 May 2021 (UTC)

Mentioning User:BrokenSegue, since they've given great feedback on past requests. Thanks! Josh404 (talk) 19:08, 16 May 2021 (UTC)
  •  Support So instead of doing optional -> filter not bound you can do FILTER NOT EXISTS but it's probably equivalent and fine both ways. Otherwise looks good to me. BrokenSegue (talk) 00:40, 17 May 2021 (UTC)
    Thanks for the review again!

    I originally recall writing this type of query with FILTER NOT EXISTS but started running into performance issues. Then saw some suggestions to try the bind/filter approach. I'm not sure why the latter is often faster. Maybe the filter not exist approach evaluates the subquery entirely rather for each matching statement? I really haven't dug into the query optimizer stuff that much.

    16 secs https://w.wiki/3LXH vs 22 secs https://w.wiki/3LXK. The total number of records is at least small for this property but I've seen even timeouts on larger sets. Josh404 (talk) 01:18, 17 May 2021 (UTC)

Sailbot[edit]

Sailbot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Simon.letort (talkcontribslogs)

Task/s: Import sailboat data from English Wikipedia Infobox sailboat specifications.

Code: In progress here. Starting approval process prior to test run.

Function details: The English Wikipedia Infobox sailboat specifications contains data for around 1.5k sailboats. The WikiProject Sailing community regularly create new Wikipedia articles using that Infobox. A Wikidata Sailboat data model has also been defined. This bot implements consistent import of information about sailboats to Wikidata from Infobox sailboat specifications in line with the Sailboat data model.

The following information is imported:

--Simon.letort (talk) 10:46, 29 April 2021 (UTC)

  • When you are ready, please do a test run.
BTW, Maybe you have seen Wikidata:Project_chat#New_essay:_of_(P642)_considered_harmful. Accordingly, another qualifier than P642 would be preferable, possibly criterion used (P1013).
Not sure what to make of "parameter 'draft' -> draft (P2262) and height (P2048) of (P642) draft (Q244777)" above. If P2262 is used, I wouldn't re-add the same with P2048. --- Jura 12:55, 29 April 2021 (UTC)

METbot[edit]

METbot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Fuzheado (talkcontribslogs)

Task/s: This bot adds depicts (P180) statements to Wikidata items corresponding to Metropolitan Museum of Art artworks and the qualifier determination method (P459) -> Metropolitan Museum of Art Tagging Initiative (Q106429444) as well as "#metttagging" to the edit summary. It uses The Met's Open Access database and a controlled vocabulary of around 1,000 high-quality keyword tags that have already been reconciled to Wikidata Q numbers.

Code: Python code on PAWS here: https://public.paws.wmcloud.org/User:METbot/mettagger/mettagger_P180.ipynb

Function details: (copied from PAWS/Jupyter notebook)

METtagger bot helps add high-quality depiction information to Wikidata items that correspond to Metropolitan Museum of Art works that have been previously made. It does so by using the weekly CSV dump The Met puts on Github.

TL;DR: This bot adds depicts (P180) statements to Wikidata items corresponding to Metropolitan Museum of Art artworks and the qualifier determination method (P459) -> Metropolitan Museum of Art Tagging Initiative (Q106429444) and adds "#metttagging" to the edit summary. It uses The Met's Open Access database and a controlled vocabulary of around 1,000 high-quality keyword tags that have already been reconciled to Wikidata Q numbers.

Since 2020, The Met has been including high-precision Wikidata Q numbers for many of their fields, which makes these bot tasks easier and more precise. These include Q numbers for:

  • objects/artifacts
  • creator/constituent
  • tag/depiction info

An example of the Wikidata_URL that The Met records in its database can be see in this API call: https://collectionapi.metmuseum.org/public/collection/v1/objects/294500

The Met Github and CSV is here: https://github.com/metmuseum/openaccess

Bot procedure

The bot works by bringing in the CSV dump from The Met and finding out which objects have a Wikidata_URL. It then uses pywikibot to iterate through the list of Q items, checking what The Met has as depiction keywords (tags) and what the corresponding Wikidata item has as its P180. If the depiction statement implied by The Met keyword tag is not in Wikidata, the bot will add a new P180 statement via pywikibot. A second pass of the Wikidata item's P180 statements will add a qualifer to indicate that P180 statement is sourced to The Met (even if the P180 statement was already there).

We are using the depicts (P180) qualifier determination method (P459) -> Metropolitan Museum of Art Tagging Initiative (Q106429444). This is to make it consistent with the same thing we are doing in Structured Data on Commons. Since SDC does not have "reference" statements like Wikidata, we felt it was better to use P459->Q106429444 on both Wikidata on Commons to stay consistent. We are open to discussing other ways to do this, but this seems sensible for now. An example diff: https://www.wikidata.org/w/index.php?title=Q78828856&diff=1409934952&oldid=1363169833

Special flags and options

There are two Python dicts that bot operators can define to restrict the behavioral logic of the P180 additions.

  • do_not_depict - Define Wikdiata QIDs you don't want the bot to add to any P180. The Met is refining its tagging so that they don't reflect "instance of" information, such as portrait, landscape art, etc. So you can list a series of Q numbers to never add as depiction info.
Scale of work

There are roughly 600,000 works of art from The Met in their CSV file, with about 20,000 having a corresponding Wikidata item. We are working methodically through the list, starting with 2D artworks and smaller sets of object names to help evaluate our working methods.

Contact

Contact Andrew Lih (User:Fuzheado) with any issues. (April 2021) --Fuzheado (talk) 02:07, 28 April 2021 (UTC)


@Fuzheado: Seems like a good project. Couple small issues:

  1. Code doesn't properly handle deprecated pre-existing P180 statements.
  2. Why are you using qualifiers instead of references? I think references are a more appropriate way to store this data.
  3. You should probably use stated in (P248) instead of determination method (P459) and maybe also add the reference URL (P854) to the specific version of the csv file you are using on github (in case the file changes later).

BrokenSegue (talk) 03:32, 28 April 2021 (UTC)

Thanks for the feedback. Here are some responses:
  1. P180 statements in the item that are not from The Met should not be considered "deprecated." So they should stay. If you're talking about P180 statements that are attributed to The Met but are no longer in The Met database, you're right that isn't handled by this bot. That can be handled by a future "roundtripping" maintenance bot. The current role of this bot is to add. But I'll look into adding that logic in follow-up bots.
  2. As mentioned above, SDC does not currently have references. So we would have a peculiar situation where relating Met P180 info in SDC would be via qualifiers and on Wikidata with references. That seems to be a suboptimal situation. I suppose we could use both methods on Wikidata - both the qualifier and a reference and be a bit redundant. We do that with other things for artworks like collection/inventory number and inventory number/collection. Further discussion welcome.
  3. My understanding is that stated in (P248) is for references only, so since I went the qualifier route, I went with determination method (P459). As for pointing to an exact CSV - have we done that for other cases in Wikidata? It seems like it may be overly specific to implementation details, since the same info is in several different places from The Met, whether it's CSV or their API. I think just stating that it's part of a project is enough in this case, and the details pointed to in Metropolitan Museum of Art Tagging Initiative (Q106429444). But I'm open to seeing other solutions. - Fuzheado (talk) 04:30, 28 April 2021 (UTC)
Regarding deprecated statements I mean that the code should not touch deprecated P180 statements already in wikidata. Sorry I'm not sure what "SDC" refers to here so I can't speak to whether not using a reference is appropriate? I personally have added links to exact versions of files/datasources. It's potentially useful but not critical. If we do go with a reference and stated in but not a reference URL I would suggest adding a retrieved timestamp. I would also suggest adding both a qualifier and a reference if for some reason a qualifier is needed. BrokenSegue (talk) 04:39, 28 April 2021 (UTC)
  • I guess it's tempting to not use Wikidata's datamodel because some other system doesn't support it, but we have made not so positive experiences with external contractors trying to place a non-Wikidata model into Wikidata. If there is a problem with the datamodel at Commons, this should be resolved there, here use the reference section for references. --- Jura 11:19, 29 April 2021 (UTC)
  • Looks good in general (except the point above). --- Jura 11:19, 29 April 2021 (UTC)
  • Hi Lymantria and Jura1 - I'm fine with implementing reference/source statements in addition to the qualifier as this is a reasonable dual solution. The case for the qualifier approach is still there, as we can imagine a variety of methods for the addition of depicts (P180) statements that would also benefit from being in qualifiers in addition to this case, including but not limited to:
    • Tools - Utilities like ISA, Wiki Art Depiction Explorer, Wikidata Image Positions all add depiction info, and tracking these additions via a qualifier statement would be reasonable since a source/reference statement wouldn't be the right approach.
    • Machine learning and AI - Similarly, automated tools and techniques using machine learning and AI are being used already to help add metadata to images/items, and tracking these in the qualifier is likely the right approach, versus a reference statement.
I should note that this type of approach is not new – there are a number of fields relevant to cultural heritage and digital humanities in Wikidata that are repeated in multiple places, such as collection (P195) or inventory number (P217), in the interest of discoverability and serving multiple approaches to modeling the data set. This is a good example of another situation where a dual approach is justified. Thanks. -- Fuzheado (talk) 04:41, 11 May 2021 (UTC)
  • It's probably normal to think that a given reference one adds is special and should be used everywhere, otherwise one would probably not add it in the first place, but that shouldn't mean one needs to add the same data three times to WMF projects.
    The comparison with catalog/catalog code is helpful: there the qualifier is needed, because the information is split between the main statement and the qualifier. This is different here.
    If there are some other tools that don't work correctly with references, maybe it's time to fix them. If they are only used on Commons, it's irrelevant for this bot request (this is Wikidata, not Commons with different Wikibase features). --- Jura 06:47, 16 May 2021 (UTC)
  • Even on Commons, it seems to be a mere GUI issue: phab:T230315 (found this by chance when searching for something else). --- Jura 11:44, 16 May 2021 (UTC)

Mbchbot[edit]

Mbchbot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Mbch331 (talkcontribslogs)

Task/s: Adding MovieMeter person ID (P9463) to entities that have a value for MovieMeter director ID (former scheme) (P1969), but not yet for P9463.

Code:https://public.paws.wmcloud.org/47625599/MovieMeter_Person_Id.py

Function details: Bot runs a sparql query to retrieve all items that should be converted, then opens the link for the P1969 value, follows the redirect and from the new url it scrapes the value for P9463 which it then stores as a new claim in the original item. Including the original full url for the P1969 value as a source. --Mbch331 (talk) 19:49, 19 April 2021 (UTC)

looks good to me BrokenSegue (talk) 01:59, 21 April 2021 (UTC)
  • The property seems to be a duplicate, see Property_talk:P9463. --- Jura 10:28, 23 April 2021 (UTC)
    • Maybe not, let's see how it goes. --- Jura 11:14, 23 April 2021 (UTC)
@Mbch331: Please, run some test edits. Lymantria (talk) 06:24, 26 April 2021 (UTC)
For now I'm putting my request Time2wait.svg On hold, because there's a PFD for 2 of the 3 MovieMeter properties. If those get deleted P9463 should be deleted as well, because the reasons to delete the other two also apply to P9463. Mbch331 (talk) 19:43, 28 April 2021 (UTC)

EdwardAlexanderCrowley (flood)[edit]

EdwardAlexanderCrowley (flood) (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: EdwardAlexanderCrowley (talkcontribslogs)

Task/s: link 1000+ pages listed in User:Qnm/list1

Code: The code below leads to [8]

import mwclient as mwc
wd = mwc.Site('www.wikidata.org', clients_useragent = UA)
wd.login(UN, PWD)
token = wd.get_token('edit', True)
wd.post('wbeditentity', format='json', token=token, new='item', errorformat='plaintext', uselang='zh-hans', assertuser='EdwardAlexanderCrowley', data='{"labels":{"zh":{"language":"zh","value":"Module:Data tables/dataM997"},"en":{"language":"en","value":"Module:data tables/dataM997"}},"sitelinks":{"zhwiktionary":{"site":"zhwiktionary","title":"Module:Data tables/dataM997"},"enwiktionary":{"site":"enwiktionary","title":"Module:data tables/dataM997"}}}', bot=1)

Function details: I copied some modules using User:CrowleyBot from enwikt to zhwikt. I need approval for linking these. Linking submodules is allowed by the policy, because Module:languages/data3/* are already linked.

If someone wants to do this for me, it's OK.

If admins request me to use CrowleyBot to do this task, then IPBE is needed, because I'm from mainland China, and only have global IPBE on the main account. --EdwardAlexanderCrowley (talk) 13:32, 20 April 2021 (UTC)

This doesn't seem to comply with standard bot policy. Please set a user-agent that is traceable back to you and respect maxlag. Multichill (talk) 10:44, 25 April 2021 (UTC)
User-agent is set at UA. ('EdwardAlexanderCrowley/0.0 (User:EdwardAlexanderCrowley)'). maxlag is default to 3 as in [9]. If needed, I'll use time.sleep(2). EdwardAlexanderCrowley (talk) 05:49, 26 April 2021 (UTC)

InforegisterIDupdater[edit]

InforegisterIDupdater (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Irpropupdatebot (talkcontribslogs)

Task/s: Updates Inforegister ID (P9321) prop on Wikidata items that include Business Registry code (Estonia) (P6518).

Code: Github

Function details: Inforegister ID (P9321) props are generated using data from excel spreadsheet received from Inforegister internal database. Over 4000 edits done using this bot. --Irpropupdatebot (talk) 06:28, 7 April 2021 (UTC)


ComplexPortalBot[edit]

ComplexPortalBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: TiagoLubiana (talkcontribslogs)

Task/s:Add and update protein complexes entities from the European Bioinformatics Institute Complex Portal platform.

Code: Code available at: https://github.com/lubianat/complex_bot

Function details: For a set of curated species, the bot will (in synchrony with Complex Portal curation):

  • Add items for macromolecular complexes absent on Wikidata
  • Add label, aliases, description and core statements (e.g., "instance of")
  • Link macromolecular complex items to their components via "part of" relations
  • Link macromolecular complex to Gene Ontology terms

For more information, details about the process are available at User:ProteinBoxBot/2020_complex_portal


--TiagoLubiana (talk) 17:06, 23 February 2021 (UTC)

 Support --Andrawaag (talk) 18:20, 23 February 2021 (UTC)

 Support Andrew Su (talk) 19:28, 4 March 2021 (UTC)

 Support --Jvcavv (talk) 20:47, 4 March 2021 (UTC)

 Support --SCIdude (talk) 08:28, 5 March 2021 (UTC)

 Support --Sulhasan (talk) 20:04, 9 March 2021 (UTC)

 Support --Bmeldal (talk) 15:18, 19 May 2021 (UTC)

Pictogram voting comment.svg Comment Sample edits are being run. Robot is doing only a couple at any type due to API rate limits for new users. TiagoLubiana (talk) 12:25, 8 March 2021 (UTC)

Pictogram voting comment.svg Comment Sample edits were sucessfully done. Bot is ready for scale up. TiagoLubiana (talk) 14:13, 15 April 2021 (UTC)

  • Pictogram voting question.svg Question does this bot overwrite or delete existing information? If so, how, when and why. --- Jura 07:48, 16 April 2021 (UTC)
  • Symbol oppose vote oversat.svg Strong oppose given the open questions. (This to avoid that it gets prematurely approved before we actually have the question sorted out). --- Jura 07:48, 16 April 2021 (UTC)
  • @Jura1: The bot follow the default behaviour for https://github.com/SuLab/WikidataIntegrator and does not explicitly delete information. I'd gladly check that, but currently, I am not able to run more test runs (repeated maxlags over 40 seconds). I'll try again and do a couple more checks, thank you for the comment. TiagoLubiana (talk) 19:48, 16 April 2021 (UTC)
    • How about overwriting (or replacing)? --- Jura 19:58, 16 April 2021 (UTC)
      • @Jura1: It was doing that, indeed, which is not appropriate. So thank you very much for the catch! I have fixed the source code, and it does not overwrite or delete information anymore. It simply appends a new statement if it doesn't exist already. I've tested the behavior on CST complex (Q105777252). TiagoLubiana (talk) 14:00, 21 April 2021 (UTC)
      • @Jura1: Given the fix, would it be possible for you to lift the strong oppose? Best, TiagoLubiana (talk) 15:16, 22 April 2021 (UTC)
      • Can you do three test edits showing correct updates (please provide links to the diffs)? To me this is still overwriting that shouldn't happen (last edit in the item you linked as sample above). --- Jura 10:18, 23 April 2021 (UTC)
        • He is updating references. In my bot I'm not doing this but I would like to see a document/discussion discouraging this. --SCIdude (talk) 14:31, 23 April 2021 (UTC)
        • @Jura1: it is only overwriting references, indeed. In the last edit that you quoted (this), the bot has not removed the molecular function (P680) and the instance of (P31) statements that were added manually (they are the only unreferenced statements in the page). As @SCIdude: mentions, it would be nice to see a document on that. However, if you think it is important, I'll invest some time to remove that too. TiagoLubiana (talk) 21:58, 23 April 2021 (UTC)
          • I don't think it should be doing that. Maybe ask on Wikidata:Project chat if you think it's a good idea. --- Jura 17:10, 27 April 2021 (UTC)
            • @Jura1: can you elaborate a bit on why you object to this? In the bot edit, the reference is updated with a novel time stamp, after the bot updated that item. Complex Portal, like other curation databases, are living resources that are being kept up-to-date in line with novel insights. Complex Portal is the primary curation source which means any Wikidata entry based on this resource should mirror Complex Portal’s updates in line with its publication cycles to ensure compatibility of available data. The bot only updates or verifies a statement added by Complex Portal and not statements from other sources. Updating the references of the primary statements with each Complex Portal release keeps the Wikidata entries up-to-date and clean while all edits are kept in the history for users to refer back to if required. --Andrawaag (talk) 15:03, 19 May 2021 (UTC)
              • It's just consistent with the way most other data in Wikidata is updated. Statements are not meant to be continuously re-written as this is done in Wikipedia. Ranks and qualifiers are meant to keep track of validity, see Help:Ranks. If the resource isn't stable, maybe it shouldn't be imported at all. --- Jura 06:38, 20 May 2021 (UTC)
                The data was not changing, I think. Only the metadata. Do you mean you want at some point 500 references for the same statement, the most recent reference ranked "best"? Is that technically even possible at this moment? --Egon Willighagen (talk) 09:31, 20 May 2021 (UTC)
                @Jura1: That is not accurate. Ststements are constantly being updated in wikidata. There is even a apicall "wbeditentity", that updates a full item in one go, which is much less expensive on the API than updating one statement, rank, reference at the time. This is from a technical perspective, but even on content level, Wikidata is constantly in flux. Everytime for example a new head of state or government is instated that statement about the previous person is updated. eg. 1. A new statement is created with new head of state and the previous statement is updated with qualifiers and sometimes new references. This has nothing to do with a resource being stable or unstable, but more with knowledge evolving. Other timse new references emerge that need to be added to the reference blob of a statement as well. --Andrawaag (talk) 10:06, 20 May 2021 (UTC)
                • Technically I can edit your comment and re-write it to say something else, but that doesn't mean I should or that it's acceptable here or in Wikipedia to do so. Wikipedia does allow to rewrite an article fully based on new information and generally doesn't require to include a historical evolution of a topic. This is different from Wikidata, where Help:Ranking outlines the approach for historic information. It's correct that qualifiers can be added to statements with new references, but that doesn't mean previous information should be re-written or deleted entirely. Also, I'm aware that we had a few legacy bots (possibly operated by yourself or people collaborating with you) that haven't been thoroughly reviewed prior to starting to operate and occasionally lead to discussions with other users when they delete information or lead to requests for deletion of entire batches of items. I hope they have been fixed in the meantime. --- Jura 07:39, 31 May 2021 (UTC)

 Support --Egon Willighagen (talk) 15:37, 19 May 2021 (UTC)

Pictogram voting comment.svg Comment Not sure how the digression about the api above is relevant here, just to summarize the basics: items, statements and references (such as the retrieved date) shouldn't be deleted, removed or overwritten. New information can be added: new items, new statements to existing items, new qualifiers and references to existing statements, etc. --- Jura 07:39, 31 May 2021 (UTC)

@Jura1: Just to be clear. You state there is consensus that a new reference should be added even if the same bot retrieves the same data from the same database one day later? If so, there is no reference for your statement. Your refusal of support for this bot seems to be based on unsupported claims and thus to be invalid. --SCIdude (talk) 10:01, 31 May 2021 (UTC)
  • If new information is added, then a reference should be provided.
    You shouldn't "touch up" all items and statements merely because you run the bot on a daily basis on a series of items without adding anything. I'm not aware of any consensus for such "touch-up" bots operating on Wikidata. --- Jura 10:07, 31 May 2021 (UTC)
@Jura1: Will you withdraw your refusal of support if the bot owner pledges to not "touch up" references in the future? --SCIdude (talk) 10:12, 31 May 2021 (UTC)
The question is less if I (or anybody) else supports or opposes the bot, but what concerns (or arguments) were advanced. As far as I'm concerned, I think with this, all I raised were addressed and so it could move ahead. --- Jura 10:38, 31 May 2021 (UTC)

So9qBot[edit]

So9qBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: So9q (talkcontribslogs)

Task/s:

  • import DOIs found in Wikipedia
  • import ISBNs found in Wikipedia
  • import JSTORIDs found in Wikipedia

Code: https://github.com/dpriskorn/asseeibot

Function details: Find DOIs and ISBNs and upload them using the sourceMD tool (if possible, if not, read from CrossRef API and mimic sourceMD). Only upload DOIs that are found in Wikipedia and missing in Wikidata.

The bot is gonna use WikidataIntegrator (which defaults to and respects maxlag=5) if sourceMD cannot be used.

See also https://www.wikidata.org/wiki/Wikidata:Bot_requests#request_to_import_DOI_and_ISBN_as_items_when_present_in_any_Wikipedia_article_(2021-02-11) --So9q (talk) 13:59, 5 April 2021 (UTC)

  • So this is gonna make a ton of new items? Can you show an example? How many new items are we talking about per day? How will this code attempt to determine if an item already exists? BrokenSegue (talk) 13:36, 8 April 2021 (UTC)
@BrokenSegue:Yes this will create a lot of new items. One for every DOI in WP. I have no estimate of how many but say 1-10 mio (only talking DOIs, I have not investigated the ISBNs but they are probably a few million also)? It will immensely help WP editors to cite scientific articles using the CiteQ template (after the article is has been used in any other Wikipedia and found by the bot). Currently the script runs on my local machine finding DOIs (not uploading itself yet) and that results in about 40 new DOIs an hour only looking at enWP. These DOIs are all missing in WD. Extrapolating that will mean that the bot will create about 28800 new items per month. You can install and run it yourself if you feel like. I posted instructions in the git-repo for how to install the requirements. :)--So9q (talk) 05:19, 12 April 2021 (UTC)
  • Please perform some 50-250 test edits. Lymantria (talk) 14:12, 8 April 2021 (UTC)
@Lymantria:I will finish writing the upload part (since sourceMD is broken and not viable) and ping you when the tests edits have been done.--So9q (talk)
It's a really big benefit if you may import books from ISBN, though the SourceMD tool provide minimal information (as of 2019) and other sources would be needed.--GZWDer (talk) 15:33, 14 April 2021 (UTC)
The best source for ISBNs that I have found is the website of oclc, example of result of 978-0-486-61272-0. The title and author and year is not copyrightable under US copyright, so we can scrape it. I might get blocked though, but we will see.--So9q (talk) 19:03, 18 April 2021 (UTC)
Before reusing any code from SourceMD tool please aware serious issues described in GitHub issues.--GZWDer (talk) 14:29, 18 April 2021 (UTC)
@GZWDer: Thanks for the warning!--So9q (talk) 16:37, 18 April 2021 (UTC)
Symbol oppose vote oversat.svg Strong oppose we don't need more scientific papers, we should be moving them somewhere else and delete them here. Multichill (talk) 10:35, 25 April 2021 (UTC)
@Multichill:Since you propose to "moving them somewhere else" I wonder whether you are interested in working towards that becoming a reality? Maybe a new partnership project with a group of universities and/or the a new grant from the Sloan foundation to WikiCite/Shared citatons? Based on the telegram group and the wiki-pages the WikiCite project seems very stale to me.
I actually think that from a narrow Wikipedian perspective most of the 34 mio. scientific articles imported by others (mostly by @Daniel Mietchen: and @GZWDer: from what I have seen) are pretty useless. On the other hand, a scientific article usually has references and if we recursively import references we might end up with 34 mio. articles or more even if we choose to delete all the ones that are not specifically mentioned in Wikipedia (or a recursive reference to one of those). There are now 4 mio. references to scientific articles with an ID in enWP and according to my small sampling with my bot my estimation is that 60-70% of those are in Wikidata already.
I'm pretty sure that the 34 mio. items with a total of maybe 500 mio. tuples that scientific articles now consist of even if the were to be "moved" or deleted are pointing to a deeper infrastructure issue with Blazegraph. Deleting these 500 mio. triples will not solve the problem for long as imports of all the worlds patents (or just the ones already mentioned in Wikipedia) or all the worlds registered beaches or similar might fill up the gap pretty fast.
@Lydia_Pintscher_(WMDE): recently mentioned the issue with Blazegraph in the telegram chat. She pointed out that in her opinion it's a fact that we are now at the limit of the number of nodes that Blazegraph was designed for. I interpret her statements in the chat this way: Adding more statements/items in the order of a couple of millions (as this bot proposal is all about) might pose a risk to the whole WDQS infrastructure. If that is correct then this is big infrastructure problem and one I suggest we put a lot of effort into solving no matter the future of the scientific articles in Wikidata.
See recent WDQS disk space issues and the high priority epic bug about finding an alternative to Blazegraph (open since 2018) which has seen very little activity until recently (after I bumped it :)). See also this recent comment about BlazeGraph from a WMF employee.
IMO the whole idea with Wikidata is to support other Wikimedia projects with centralized structured data which is exactly what this bot-job is about if you ask me, but I can see that in the bigger picture a fruitful WikiCite project that can easily be linked from special(?) properties in Wikidata might be a better solution.
I invite others to join this discussion and state their views.--So9q (talk) 17:51, 4 May 2021 (UTC)
We have no rush. Let's see what comes from the Shared Citations project, that seems anything but stale. Ainali (talk) 21:19, 4 May 2021 (UTC)
Shared Citation does not intended to be a "bibliographic commons" (i.e. a collection of all books and articled published ever), while some users proposed Wikidata to be.--GZWDer (talk) 02:50, 5 May 2021 (UTC)
We use a common knowledge base for a number of different purposes (such as Cite Q template), and Scholia will be more usable if we have a complete corpus of papers (currently we have not even completed 20% of them. Also @Multichill: what is the benefit to import all artistic works comparing with papers?--GZWDer (talk) 18:02, 4 May 2021 (UTC)
Why are you asking me about importing all artistic works? Are you planning to do so or is this a straw man? Multichill (talk) 20:12, 4 May 2021 (UTC)
You may find it beneficial to do so (and indeed commons may make good use of them), while many others consider importing all articles useful. (In my opinion only, this means 200 million new items.)--GZWDer (talk) 02:26, 5 May 2021 (UTC)
Symbol oppose vote oversat.svg Strong oppose. What Multichill said. Also that this "Import It All!" mentality has come up repeatedly on the Wikidata Telegram channel (the "it" changes, but the idea is always the same) and he has been told every time that it is not a good idea because of the limitations of hardware and resources, the fact that these are always imports without any kind of maintenance, etc. yet refuses to listen and at this point, this has a long time ago already turned into a game of pigeon chess that the community doesn't need to be playing. -Yupik (talk) 15:27, 25 April 2021 (UTC)
BTW: See phab:T281854 - WMF is proposed to introduce new endpoints dedicated to scientific articles.--GZWDer (talk) 18:03, 4 May 2021 (UTC)
 Oppose Hold off until we have more clarity of direction around the Shared Citations proposal. - PKM (talk) 22:19, 4 May 2021 (UTC)
While I like the idea of shared citations, the goal is different: m:WikiCite/Shared_Citations#Database - It is not a place to compile completed sets of citation corpora (also known as "stamp collecting") or an attempt at a universal a "bibliographic commons". Plus, shared citations does not involve the relationship of articles (how they are referenced each other).--GZWDer (talk) 02:29, 5 May 2021 (UTC)


New section[edit]

(I'm commenting in a new section as this was already approved. ) If there is a way to add a different instance of (P31) than landmark (Q2319498) that would be great. Maybe building? --- Jura 20:11, 2 April 2021 (UTC)

Unfortunately, there is no trivial way to distinguish different samples for heritage sites classified officially as "monument of urban planning and architecture". Such objects may be parks or gardens, some constructions like bridges, mansions, city quarters etc, not only buildings. Avsolov (talk) 22:23, 2 April 2021 (UTC)
Ok, maybe it's possible to do a search-and-replace afterwards.
I'm not sure if you noticed, but we try (tried?) to avoid adding "historic monument" as P31 and use the dedicated property instead. --- Jura 07:38, 3 April 2021 (UTC)
You mean "historic site" and "heritage designation", don't you? Well, Russian official registry provides the following attributes for heritage objects: "type" (or "typology") and "category of protection". I am going to match "type" with Property:P31 and "category of protection" with Property:P1434. Do you see other possibilities? Avsolov (talk) 12:04, 3 April 2021 (UTC)
Yeah, heritage designation. Given your data, I think the P31 values other than landmark should be fine. --- Jura 10:45, 4 April 2021 (UTC)
Then, what would you recommend as P31 values instead of "landmark" in the case of "monument of urban planning and architecture"? Avsolov (talk) 18:13, 4 April 2021 (UTC)
This request briefly describes planned actions. The community of Russian Wikivoyage has more extended discussion concerning this task.
More details regarding Property:P1434: we plan to reflect protection category onto this property. Protection category may be represented as Q105835774 (tentative cultural heritage site), Q23668083 (federal cultural heritage site), Q105835744 (regional cultural heritage site), Q105835766 (local cultural heritage site), or Q105835782 (candidate heritage site). Avsolov (talk) 12:16, 3 April 2021 (UTC)
  • The trial edits actually look quite good. Let's see how it goes. --- Jura 10:45, 4 April 2021 (UTC)

Review of edits[edit]

@Mike Peel: thanks for the heads up on this. I reviewed some of the edits done since the approval. While, I don't have much of an opinion on labels that are not in Latin script, for any edit, it might be worth mentioning the previous label in the edit summary.

Sample: [19]

Current edit summary
Changed label, description and/or aliases in en: Remove text in brackets from en label
Suggested new summary
Changed label, description and/or aliases in en: remove text in brackets from label "Alan McLean (New Zealand cricketer)"

"Changed label, description and/or aliases in en" is the automatic part, so I suppose it can't be changed but gets internationalized.

Also, Master of the Codex Manesse (Foundation Painter) (Q59285703) shouldn't have been done. I don't think it much of an issue, especially that it lacked notname (Q1747829) that would be sufficient to identify/skip it. --- Jura 08:04, 16 April 2021 (UTC)

@Jura1: I prefer not to quote the labels in the edit summary, but can do if needed. You see the new label anyway when you are browsing in that language. I don't understand why Master of the Codex Manesse (Foundation Painter) (Q59285703) is an exception? Thanks. Mike Peel (talk) 08:07, 16 April 2021 (UTC)
  • @Mike Peel: With the correct P31, Master of the Codex Manesse (Foundation Painter) (Q59285703) would have been part of [20] who are mostly named for whatever work of theirs is known. --- Jura 08:25, 16 April 2021 (UTC)
  • Given that the bot does edits in multiple languages, people who review wont see all labels. If you prefer, you could add the new label in the summary (this would be closer to what Wikibase does automatically). This could be accompanied by some note, e.g.
Suggested new summary
Changed label, description and/or aliases in en: label "Alan McLean" after removing text in brackets
Personally, I prefer the earlier suggestion. --- Jura 08:25, 16 April 2021 (UTC)
Ideally, if only the label is edited, the automated part would state that, but maybe this is not something the bot operator can modify. Avoids having to repeat it afterwards. Sample:
Suggested new summary
Changed English label: Alan McLean // task: remove text in brackets
Not sure about the ideal way to separate the label from the explanation (I used "//" here). I considered using square brackets ("[]"), but, at least for this task, it could be confusing.
If the default summary can't be changed, it would be:
Suggested new summary
Changed label, description and/or aliases in en: label "Alan McLean" // task: remove text in brackets
The same could be helpful for other tasks. @Mike Peel: --- Jura 09:51, 18 April 2021 (UTC)
@Jura1: I went with the first option in the end, code changed. Thanks. Mike Peel (talk) 12:51, 25 April 2021 (UTC)
Suggested new summary
Changed label, description and/or aliases in en: Alan McLean, #task 21: remove part of label in brackets
  • @Mike Peel: Thanks, sounds good. After doing some experiments with edit summaries and looking at the ones Magnus' tools add (", #"), the above would probably be closest to his. --- Jura 17:15, 27 April 2021 (UTC)

QuebecLiteratureBot[edit]

QuebecLiteratureBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Mchlggnn (talkcontribslogs)


Task/s: Populate Wikidata with information about the works published by Canadian writers.

Code: Will be added later.

Function details: The bot will create new items for each book published by an author (very few exist at this moment in Wikidata).

Information about published books will mainly come from BAnQ, which is the legal depositary of published works in the territory of Quebec (Canada), and Infocentre littéraire des écrivains. The added data will make the distinction between the work, which is the abstract concept of the book, and every manifestations of this work, which are published editions, associated to the following information: date of publication, publisher, number of pages, etc.

For an example, you can consult the links ww added to the writer Sylvain Trudel. You can see the he is the author of the two following works:

The first one is associated to three editions, using the property has edition or translation:

The first two correspond, respectively, to the first edition of the original book, and another later edition with another publisher. The last one is an English translation. Note that each manifestation (that is, an edition) is indicated with a date, that distinguishes it from the work itself.


--Michel Gagnon (talk) 01:05, 27 February 2021 (UTC)

  • this needs more detail to be approved. BrokenSegue (talk) 14:01, 27 February 2021 (UTC)
  • I added some detail and a complete example. Mchlggnn 13:13, 2 March 2021 (UTC).
  • So essentially all works and editions subject to depôt légal in Quebec would be imported?
  1. How many would this be?
  2. How many would be added each month?
I'm not sure Wikidata is up to that. Maybe input from Wikidata:Contact the development team should be sought. --- Jura 08:56, 8 March 2021 (UTC)

This bot request looks stale and the numbers look scary so  Oppose for now and this request should probably just be closed. @Mchlggnn: you need to respond to prevent this. Multichill (talk) 10:48, 25 April 2021 (UTC)

taxonbot[edit]

taxonbot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Thomasstjerne (talkcontribslogs)

Task/s: Add BOLD Systems taxon IDs to taxon pages

Code: https://github.com/thomasstjerne/taxon-wikibot

Function details: Traverses the BOLD checklist dataset in GBIF which is matched to the GBIF Backbone taxonomy. For each record in the BOLD checklist, the corresponding record in the GBIF backbone is used to locate the taxon in wikidata (through GBIF ID), and then the BOLD Systems taxon ID is inserted. --Thomasstjerne (talk) 20:26, 8 February 2021 (UTC)

Maybe you shoud reconsider your bots name. --Succu (talk) 21:17, 8 February 2021 (UTC)
  • you have fewer than 100 edits. is it safe to give you the control of bot flag? Also that bot username isn't registered which is odd. BrokenSegue (talk) 01:04, 9 February 2021 (UTC)
  • We already have a bot named TaxonBot, so I would prefer another name. --Ameisenigel (talk) 05:29, 9 February 2021 (UTC)
  • Should I update the bot name on this page (to keep discussion history) or should I create a new request page for the new bot name? Thomasstjerne (talk) 18:05, 10 February 2021 (UTC)
    Please update the name, and someone would move the page--Ymblanter (talk) 20:18, 10 February 2021 (UTC)
    • It seems this task is already taken by SuccuBot. Lymantria (talk) 07:17, 24 February 2021 (UTC) and yourself without bot. Lymantria (talk) 09:11, 26 February 2021 (UTC)
  • Quite messy: https://www.wikidata.org/w/index.php?title=Q14659481&action=history . Had SuccuBot been approved for this task in the meantime? --- Jura 09:04, 8 March 2021 (UTC)
    • yes. --Lymantria (talk) 06:14, 10 March 2021 (UTC)
      • If it had been coordinated with Thomasstjerne, fine. Otherwise I find it somewhat suboptimal. --- Jura 07:06, 10 March 2021 (UTC)
        • @Jura: „suboptimal” was that Thomasstjerne was running a faulty script with his user account. --Succu (talk) 21:06, 10 March 2021 (UTC)
      • If the fault was that he didn't check if someone else who discussed it here did it in the meantime, I think it's relatively minor. Besides, he figured out how to fix it and did fix it. --- Jura 11:38, 11 March 2021 (UTC)
  • @Thomasstjerne: Please register the bot and let it perform some test edits. Lymantria (talk) 06:18, 24 March 2021 (UTC)

MarcoBotURW[edit]

MarcoBotURW (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: MAstranisci (talkcontribslogs)

Task/s: I am a PhD students working on migrants authors, and their narratives. I am using Wikidata to find them, and their nationalities but I noticed that the few of them has this informations and I want to create new claims.

Code: I don't have a repository yet but my aim is to create it to deliver the code

Function details: I use Python. I would like to create these claims by referring to the lists of authors by ethnicity or nationality.
Moreover, I found that the EUROVOC IDs (namely, P5437) point to an older version of the EUROVOC dataset, so I would like to update them. Furthermore, I found that these two integration could be proposed: insert the country of birth as the first element of the field place of birth (P569); reorganize the country property of a place (P17) in order to have the actual country as the first label


GZWDer (flood) 6[edit]

GZWDer (flood) (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: GZWDer (talkcontribslogs)

Semi-automatic edits using various tools, with following condition:

  • Relevant Wikiproject will be notified for any tasks with more than 1000 edits; exceptions:
    • Fixing existing bot error (by this, future or prior tasks), and bot errors by others (with a notice to the user causing it, if the error is likely to reoccur)
    • Any requests posted by others in Wikidata:Bot requests (but I will wait some days using common sense to allow others commenting on)
  • A discussion with clear intent of bot work will be started in relevant Wikiproject talk page or Project chat seven days before proposed edits if 1. more than 100000 edits are involved; 2. more than 10000 items will be created; or 3. this task involving creating new lexemes, forms or senses. A new bot approval may be required if recommended by other users in discussion. Exceptions:
    • Fixing existing bot error (by this, future or prior tasks)
    • Fixing bot errors by others, with agreement of that user
    • If there are consensus to do the task elsewhere

In all cases the bot may do no more than 100 edits as demo.

Code: varies among tasks

Function details:

Created as a request from User_talk:GZWDer. For a long time, this account acts as a dedicated aemi-automatic editing account to make edits of main account more

Any issues about previous edits should be reported to User:GZWDer/issues, not here. When raising a concern, be prepared to response to replies to your responses.--GZWDer (talk) 20:03, 12 January 2021 (UTC)

@Multichill, So9q, Jura1: Please notice other participants of Telegram chat.--GZWDer (talk) 20:04, 12 January 2021 (UTC)
Done. Thanks for taking the time to write this and prepare the cleanup. I wish you good luck.--So9q (talk) 20:35, 12 January 2021 (UTC)
I see now that you requested rights to modify other objects unrelated to the cleanup. I suggest you remove that and save that for later until the cleanup is done and the communitys trust in you has been restored. Also it might be a good idea to change the request title to clearly state "cleanup". I hope that a succesful cleanup will go a long way to clear you reputation. I suggest you ask previous critics on their talk page to comment on this proposal. An sincere apology about actions that got you blocked earlier would probably also help rebuild trust. ;-)--So9q (talk) 20:40, 12 January 2021 (UTC)
I do need to collect (from other users) what needs to clean up at User:GZWDer/issues before any specific action. I do avoid any edits that is potentially controversial.--GZWDer (talk) 21:45, 12 January 2021 (UTC)
I asked the operator to not do any edits with an unauthorized bot and I shared something about trust "A bot flag is a statement of trust: Trust to do correct edits, trust to not abuse it, trust to fix issues, trust that you will clean up the mess that might be caused by a robot, etc. It looks like multiple people lost that trust in you to the point that the authorization was revoked. It's going to be hard to regain that trust. Running an unauthorized bot (or running a bot under your main account) will only make matters worse."
After that message the operator decided to edit anyway so I changed the block to a complete one (already had a partial block for months). I lost trust in this operator. I'm not sure this operator should operate any bots. Multichill (talk) 11:52, 13 January 2021 (UTC)
+1 on Multichill. Also I didn't like the "please write your complaints in my dedicated subpage" approach. I might not be assuming good faith, and I'd be very happy to be disproved, but it sounded to me very much like GZWDer didn't want to understand what was the problem with their edits. --Sannita - not just another it.wiki sysop 13:00, 13 January 2021 (UTC)
@Multichill, Sannita: Let's go ahead to draft a plan to fix the issue. I does identified some issues at that subpage, but I am not sure others may found more. Until then I will refrain running any bots or scripts (approved or not, under any accounts) not related to fixing issues, but (unless I provide a list for others to work on, or running scripts on main account) it does need a dedicated usable bot account (whether having bot flag or not).--GZWDer (talk) 13:10, 13 January 2021 (UTC) struck 16:19, 8 February 2021 (UTC)
Note I have collected all issues in threads in 2020 to the issue page but I am not sure if I missed any.--GZWDer (talk) 13:32, 13 January 2021 (UTC)

Note I have revised the bot task description. Previously others recommends me to only do the cleanup but I believe I have done all what I can do without a discussion (though some stuffs need discussion about how to fix them).--GZWDer (talk) 16:19, 8 February 2021 (UTC)

Thanks for revising the description. I  Oppose any bot work from you until everything from before is cleaned up (if you want to use a bot for cleanup, please make a clear request for that).
I would like a clear bot request for every single source you wish to import from after the cleanup is done. Broad blanket requests are not for the best of Wikidata IMO. When you are in good standing making and getting a bot request approved should not take long. If it does, it's because it needs discussion and that's a good thing. I want high quality data in WD and low quality imports are not my preferred way to get to that goal.So9q (talk) 09:37, 9 February 2021 (UTC)
@So9q: "until everything from before is cleaned up" - I already closed many of issues in User:GZWDer/issues (which I can fix unilaterally). Some issues needs discussion. For the second point, see the ongoing discussion at Wikidata:Project_chat#QuickStatement_Approval? (my opinion is described in task description).--GZWDer (talk) 12:46, 9 February 2021 (UTC)

JarBot 5[edit]

JarBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: جار الله (talkcontribslogs)

Task/s: Adding "Monolingual text" (P2096) to images (P18) from arwiki.

Code:pywikibot

Function details:Hello, we use infobox based on wikidata and we want to remove duplicate file link from the article but we don't want to lose the comment.1, 2. --جار الله (talk) 05:40, 19 December 2020 (UTC)

  • Sorry if I'm confused I think there's maybe a language barrier here. But is the text you are importing licensed under the public domain? If not you cannot import it. Right? BrokenSegue (talk) 17:01, 19 December 2020 (UTC)
@BrokenSegue: No, the text isn't licensed under the public domain. I heard "mass import" is unacceptable but importing few might be reasonable. the task include around 1,700 texts only.--جار الله (talk) 17:59, 19 December 2020 (UTC)
@جار الله: I'm no copyright expert but you're gonna have to argue the text is simple enough to not be copyright-able. I cannot comment as I do not speak arabic. Otherwise I have no objections and support. BrokenSegue (talk) 18:01, 19 December 2020 (UTC)
@BrokenSegue: Now there is less than 1,500 includes 130 that the text is the same as the Label or the Pagename. If all the 1,500 cannot be import at least we can import the "Monolingual text" that same as Label or the Pagename and it's licensed under the public domain.--جار الله (talk) 05:23, 30 December 2020 (UTC)
If the legend is the same as the label or pagename, I don't think it's needed as qualifier on Wikidata. --- Jura 09:13, 8 March 2021 (UTC)

Cewbot 4[edit]

Cewbot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Kanashimi (talkcontribslogs)

Task/s: Import new articles from online recsources.

Code: https://github.com/kanasimi/wikibot

Function details: Please refer to Wikidata:Bot_requests#weekly import of new articles (periodic data import). The task will import new articles from PubMed Central, about 30K articles every week. Maybe imports from other resources in the future. --Kanashimi (talk) 03:49, 27 November 2020 (UTC)

PubMed ID (P698) will be used to avoid duplicates for articles from PubMed Central. For other resources, identifier and article title, author(s) will be checked. --Kanashimi (talk) 20:04, 27 November 2020 (UTC)
  • 1. You should check DOI too (but some does not have a DOI). 2. What source you will use to resolve the authors? Many does not provide enough information (i.e. ORCID) to resolve them.--GZWDer (talk) 05:09, 1 December 2020 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

  • Thanks for doing these. I don't think complex author resolution is needed, but if it can be done, why not. OTH Inclusion of journal or other publication venue would be useful. Previous imports sometimes skipped them when an item wasn't created (meaning the bot or its operator needs to create it when one is encountered). User:Research_Bot/issues lists a few past problems. GZWDer talk page has a few others. --- Jura 17:12, 1 December 2020 (UTC)
@Jura1: Thank you. User:Research_Bot/issues is very useful. @GZWDer: I will also try DOI (P356). If there is no information of author(s), i will skip the check. --Kanashimi (talk) 00:24, 2 December 2020 (UTC)

Datatourismebot[edit]

Datatourismebot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Conjecto (talkcontribslogs)

Task/s: Update of the property Datatourisme ID during the internal reconciliation process

Code: We will mainly use the Wikidata Toolkit java library

Function details: --Conjecto (talk) 23:14, 23 November 2020 (UTC)

Fab1canBot[edit]

Fab1canBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Fab1can (talkcontribslogs)

Task/s: Adding country to islands that have no country

Code:

Function details: The bot takes the island's country from its Wikipedia page in other languages and adds it to wikidata --Fab1can (talk) 14:00, 1 November 2020 (UTC)

romedi 1[edit]

romedi 1 (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Scossin (talkcontribslogs)

Task/s: Add relation ?entity is a medication "?entity wdt:P31 wd:Q12140". I detected missing statements for molecules in commercialized drugs, for example: https://www.wikidata.org/wiki/Q353551 is a medication.

Code: https://github.com/scossin/RomediApp https://www.romedi.fr

Function details: --Scossin (talk) 13:52, 24 October 2020 (UTC) addMedicationStatement(entity): RDFstatement

Please register the bot and run some 50-250 test edits. Lymantria (talk) 09:55, 1 November 2020 (UTC)

FischBot 8[edit]

FischBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Pyfisch (talkcontribslogs)

Task/s: Remove date of birth, date of death statements sourced only by VIAF.

Code: not public at this point

Function details: The bot works from a list of VIAF records linked to Wikidata items. For each item it removes all date of birth (P569) and date of death (P570) statements with the source stated in (P248): Virtual International Authority File (Q54919). In case the statement has additional sources, only the reference is removed.

I already removed (edits) those dates from VIAF that were marked "flourished". They were wrongly imported as dob/dod. Other dob/dod statements imported from VIAF may be correct, but VIAF is not a suitable source as it discards date information found in other authority files and incorporates information from Wikidata. Some common errors are: missing circa, wrong precision (year instead of century), flourished dates not marked as such and more. In the future the dates should be added directly from the relevant authority control file.

Some examples of dubious or wrong dates from VIAF (query):

@Magnus Manske: you originally added most of these statements. @Jura1, Epìdosis: from prior discussion at Wikidata:Bot requests.

If there is no objection to the removal of these statements I will start the bot on Friday. --Pyfisch (talk) 23:40, 5 October 2020 (UTC)

  • Symbol strong support vote.svg Strong support --Epìdosis 06:19, 6 October 2020 (UTC)
  • Thanks for doing these. Most helpful. From the above query, I checked a few rdfs exports at VIAF. They generally have the date and one or several sources where it could come from. Generally, it can be found on one of them. Sometimes this is LoC or idref (both tertiary sources), but it could also be ISNI or dbpedia, which would probably make VIAF a quintary source. Obviously, others can have the same problem, e.g. a LoC entry has several references without the dates being attributed to one of them. To sum it up: I'd also deprecate (if no other ref is present) or remove these references/statements. --- Jura 10:40, 6 October 2020 (UTC)
    BTW, when we will import dates from VIAF members, the first ones I would consider are the following: GND ID (P227), Library of Congress authority ID (P244), Bibliothèque nationale de France ID (P268), IdRef ID (P269). --Epìdosis 22:34, 7 October 2020 (UTC)
  • @Magnus Manske: would these be re-imported by some tool? --- Jura 10:40, 6 October 2020 (UTC)
    I really fear these statements would be (at least in part) reimported by @Reinheitsgebot: from MnM catalog 2050. An option in MnM should be inserted: it should be possible to mark a catalog as not suitable for the automatic addition of references based on it; this option would also be very useful for CERL Thesaurus ID (P1871) (= catalog 1640), which isn't an independent source too, and for other catalogs. --Epìdosis 12:48, 6 October 2020 (UTC)
  • Unfortunately  Support As explained above, and as I have seen in items, there are too many bad claims in this import from VIAF. --Shonagon (talk) 16:25, 6 October 2020 (UTC)
  • Pictogram voting comment.svg Comment After some thought, I think it's preferable to keep the statements that were correctly imported from VIAF and only deprecate them when the statements are known to be incorrect. VIAF's approach isn't much different from other tertiary sources mentioned above, i.e. LOC, CERL or GND would be preferable with their secondary source, notably for GND that has become a wiki.
    The statements Pyfisch removed in the initial batch were different: there we knew bots had imported them incorrectly into Wikidata. --- Jura 17:28, 8 October 2020 (UTC)
    There is a bunch of dates already labeled "circa" by VIAF, but this qualifier is missing for these dates on Wikidata. In addition dates that are stated as "19.." or "20th century" in the sources VIAF uses are recorded in VIAF as 1950 and imported into Wikidata. This issue equally applies to dates with decade precison. While I can't be sure that the data for people with "date of birth: 1950" in VIAF is wrong, as there are people who were actually born in 1950, it is very likely. --Pyfisch (talk) 09:24, 16 October 2020 (UTC)

RegularBot 3[edit]

RegularBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: GZWDer (talkcontribslogs)

Task/s: Automatically import articles from Russian Wikinews

Code: Using harvest_template.py, and newitem.py

Function details: Recently some bots created many pages in Russian Wikinews. ~10000 articles per day is excepted (but once the the initial import in Russian Wikinews is completed, it may be 100-300 per day). This task involves: (The following may be done separately)

  •  Oppose there is still need from cleanup from previous bot runs. The above seems to run without approval. --- Jura 09:46, 9 September 2020 (UTC)
    • Specific issues may be easy to fix, see earliest edits of this account. But please point out them.--GZWDer (talk) 10:28, 9 September 2020 (UTC)
      • I think you have done sufficient test edits. I have asked this account to be blocked until any of its planned tasks are approved, especially as the operator thinks approval isn't needed. --- Jura 10:33, 9 September 2020 (UTC)
        • So other than "without approval", are there any issue about this specific task and bot's edits?--GZWDer (talk) 11:56, 13 September 2020 (UTC)
@SCIdude, Charles Matthews: For this task only, I do not expect duplicates.--GZWDer (talk) 10:36, 13 September 2020 (UTC)
@Edoderoo: For this task only, eventually more statements will be added (see "Function details").--GZWDer (talk) 10:38, 13 September 2020 (UTC)
I am not going to this task without an approval, but do community have more comment?--GZWDer (talk) 11:55, 13 September 2020 (UTC)
@Jura1: Do you have more comments about this task in particulars? I feel that we should not have issues about this task.--GZWDer (talk) 14:15, 14 September 2020 (UTC)
What are your thoughts about the applicability to your bot/flood/etc accounts of "The bot operator is responsible for cleaning up any damage caused by the bot" (see Wikidata:Bots#Bot_accounts)? --- Jura 15:56, 14 September 2020 (UTC)
@Jura1: Existing issues are being fixed (please point out). I does not expect issues from this task.--GZWDer (talk) 16:02, 14 September 2020 (UTC)
Can you do a sum up of recently raised issues and provide ways we can check that they are fixed? You can't just open a new bot request and expect people re-repeat every problem you are meant to fix every time. --- Jura 16:04, 14 September 2020 (UTC)
@Jura1: Inspired by User:Research Bot/issues (another bot making a large number of issues), I have created User:GZWDer/issues. Feel free to expand it.--GZWDer (talk) 17:17, 14 September 2020 (UTC)
@Jura1: Did you identified any other issues?--GZWDer (talk) 13:04, 15 September 2020 (UTC)
@Jura1: Do you have any concern?--GZWDer (talk) 11:21, 17 September 2020 (UTC)
Why is User_talk:GZWDer/2019#Prefixes_in_labels still not fixed? It was discussed at [25] just recently and yet you still fixed just one type of thousands of problematic labels, check Q75645721, Q76226338, Q75911351. You probably repaired less then the other people who helped you.
Also, the Peerage import lead to the addition of information about countless minors and other not notable persons. As even a supporter of that import brought up, there is no consensus for such publications on Wikidata. These still need to be selected and proposed for deletion.
As you kept running this bot without approval, I think it's better blocked indefinetly. --- Jura 06:51, 20 September 2020 (UTC)
  •  Oppose (Repeating what I said in another RFP) This user has no respect on infra's capacity in any way, these accounts along two others has been making wikidata basically unusable (phab:T242081) for months now. I think all of other approvals of this user should be revoked, not to add more on top. (Emphasis: This edit is done in my volunteer capacity) Amir (talk) 06:55, 20 September 2020 (UTC)
Comment: I think that would go too far. But I have thought for some time now that community regulation of bot editing should be put on a more organised footing. And I say this as someone who makes many runs of automated edits (channelled through QuickStatements). We need better definitions of good practice, and clearer enforcement.
Currently, we try to deal with ramifying issues and loose specifications with threaded discussions, spread over many pages. The whole business needs to be taken in hand. Structure is required, so that the community can manage the bots and the place is not simply an adventure playground for them. Charles Matthews (talk) 07:06, 20 September 2020 (UTC)
  • Obviously, everybody makes errors or might overlook some aspects once in a while, but most other operators are fairly reliable and try to clean up behind them. --- Jura 07:13, 20 September 2020 (UTC)
I don't know why you say that. Systematic problems with constraint violation is an area where major bots simply ignore the bot policy and good practice. Charles Matthews (talk) 07:40, 20 September 2020 (UTC)
do you have a sample? --- Jura 07:43, 20 September 2020 (UTC)
I work on cleaning up Wikidata:Database reports/Constraint violations/P486. If you graphed the "Unique value" violations over time (first section), you would see that they climbed gradually to over 3.1K. This was largely the work of one bot, whose owner was ignoring the issue. I had those edits, which were over-writing corrections, stopped in mid-2019. No bot corrections were made subsequently: I remove the violations by hand, and they are down to 40% of the peak. There are other properties where similar problems continue, to this day. Charles Matthews (talk) 08:02, 20 September 2020 (UTC)
If you don't think the operator's response on User talk:ProteinBoxBot is adequate, I'd ask for a block. It's not ok that it overwrites your edits. --- Jura 08:27, 20 September 2020 (UTC)
Well, I wouldn't. I discussed the matter at WikiDataCon on Berlin, as a dispute that needed to be resolved. I came to an understanding, face-to-face, and that was the pragmatic thing to do. That is really my point: no principles were documented, no fixes agreed, the whole thing was done with bare hands. Since I have considerable dispute resolution experience on enWP, I could see that was the way to go. There is no formal dispute resolution here on Wikidata, and the problems are complex. There is a two-dimensional space, one dimension being the range of issues, and the other the fixes. While informal dispute resolution is better in at least 90% of cases, the piecemeal approach and lack of documentation is not OK, and something should be done about it. We are talking about the difference between 2015, when people were grateful to have bot operators working away, and 2020 when Amir can talk as above, which is an informed judgement. I don't think reducing the "fix" dimension to blocks and bans is adequate, though: that is my Arbitration Committee experience talking. Charles Matthews (talk) 08:43, 20 September 2020 (UTC)
If you are happy with the outcome, why bring it up here? Either the bot operates as it should or it doesn't. --- Jura 08:52, 20 September 2020 (UTC)
I didn't say I was happy. You did ask for a sample. I'm coming from a direction that sees more nuance, more human factors. What is said in Wikidata:Bots is "Monitor constraint violation reports for possible errors generated or propagated by your bot", which implies self-regulation. I think, having dealt with GZWDer also in a major dispute, that language is too weak, and hard to enforce. Charles Matthews (talk) 09:07, 20 September 2020 (UTC)
I haven't read that bot's talk page in detail, but if it overwrites other editors' contributions that fix things, this is a major problem that has nothing to do with constraint violations. In some wikis, the end up blocking the operator over such conduct. --- Jura 09:16, 20 September 2020 (UTC)
Well indeed. And if such code is still in use, it is because of inertia in replacing older, Python-based libraries, I would guess, when there are certainly better solutions available. Which is a desirable change. That issue is at least in part about implementing change, and overcoming reluctance to change code whose development costs should now have been fully depreciated. In the case of ProteinBoxBot, there is contract work done here, but not properly declared here as it should be under the Wikimedia general terms of use (IMO). As far as I'm concerned this is all a can of worms. The legacy code issue clearly does apply to GZWDer, too. When I talk about the inadequacy of a piecemeal approach, these are some of the considerations I have in mind. An on/off switch for bot editing really is crude if we want to get to the root of things. We may see things very differently, but this is what is on my mind when I argue for more "structure". Charles Matthews (talk) 09:33, 20 September 2020 (UTC)
WikidataIntegrator (as also used by ProteinBoxBot) has plenty of issues and bugs that I applied more than one dozen local fixes (not pulled to upstream as some are just hacks or task-specific), but removing existing statements is a fundamental problem. In the long-term future I plan to get rid of it completely, but I do not know when I can work for an alternative.--GZWDer (talk) 16:03, 20 September 2020 (UTC)
Maybe you should look at what Magnus Manske has been doing with Rust for the past 18 months. Charles Matthews (talk) 19:09, 20 September 2020 (UTC)
@Charles Matthews: Magnus's rust bot still have many serious issues like Topic:V2fzk650ojg2n6l1 and [26]. Currently Magnus have not brought it to an usable situation. The code can not be used without a substantial fix.--GZWDer (talk) 21:05, 20 September 2020 (UTC)
What I said about inertia. Charles Matthews (talk) 04:26, 21 September 2020 (UTC)
@Ladsgroup: I said this task will not run with more than 60 edits every minute. Do you still oppose?--GZWDer (talk) 15:43, 20 September 2020 (UTC)
@Ladsgroup: Do you have any comment on discussions above?--GZWDer (talk) 22:00, 21 September 2020 (UTC)
  •  Oppose What's the point in adding that many russian wikinews? Do they really need to be imported? Is there any chance any of them will ever need to be linked?--Hjart (talk) 07:20, 20 September 2020 (UTC)
    • There is so much more then only linking to other languages about Wikidata, that I do not know where to start to answer your question. Edoderoo (talk) 07:39, 20 September 2020 (UTC)
      • So that such a concern is not valid. @Hjart:.--GZWDer (talk) 15:36, 21 September 2020 (UTC)
  •  Oppose do we really need more objects instance of wikinews articles? --Sabas88 (talk) 07:01, 23 September 2020 (UTC)
    • Providing metadata is just one of many purposes of Wikidata. Ideally, it should be expected that every Wikimedia (other than Wiktionary) articles han have an item. @Sabas88:--GZWDer (talk) 18:41, 23 September 2020 (UTC)
@Ladsgroup, Hjart, Sabas88: Do you have any further comments?--GZWDer (talk) 02:08, 29 September 2020 (UTC)
You have pinged me three times and posted on my talk page as well. Given that WMDE is going to remove noratelimit from bots, your bot won't cause more issues hopefully but you lost your good standing with regards to respecting infra's capacity to me. Amir (talk) 18:52, 10 October 2020 (UTC)

RegularBot 2[edit]

RegularBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: GZWDer (talkcontribslogs)

Task/s: Mass creation of items from Wikimedia articles and categories.

Code: Using a modified version of newitem.py

Function details: I intended to move all mass sitelink import feature to this bot. Feel free to raise your concern. --GZWDer (talk) 07:28, 8 August 2020 (UTC) @Tagishsimon, SCIdude, Hjart, Jheald, Edoderoo, Animalparty: @Charles Matthews, Voyagerim, Sabas88, Jean-Frédéric, Ymblanter: Should we import unconnected pages from Wikipedia at all?

  • Previouly this is cleaned up annually or semi-annually
  • In my opinion even if importing new items will result in duplicates, not importing them at all defeats the purpose of Wikidata
    • This results in large and infinitely growing backlog which including many duplicates will few people fixing them (such as the cebwiki one)
    • Importing them at least allows users finding them using various tools (including when the item is improved)
    • Some wikis have people cleaning up unconnected pages, but many wiki does not
  • Alternatively we only import items with a specific age (the default of newitem.py is 14 days since creation and 7 days since last edit, and there are bot importing from nlwiki and cswiki using such setting; but I use 1/0 setting)

--GZWDer (talk) 07:46, 8 August 2020 (UTC)

Discussion[edit]

  •  Support if you use the default 14/7 - I'm editing such items right now. However, it would help if you could make a quick page that lists tools to find them. I used PetScan for the titles and a WD dump, but this is out of reach for most people. --SCIdude (talk) 07:59, 8 August 2020 (UTC)
    • Note I will use different setting for different wikis, and I don't think 14/7 is a right solution (especially for actively edited pages). In most wiki I suggests 1/0 with a template skip list, unless there are someone actively cleaning up unconnected pages and suggests a different setting (in many wikis there are not).--GZWDer (talk) 08:38, 8 August 2020 (UTC)
      • Before you import unconnected pages and especially if you do 1/0 you should definitely talk with affected communities. You should not just import them without notifying anyone. --Hjart (talk) 08:59, 8 August 2020 (UTC)
  •  Oppose without much better efforts to identify matches to existing items, and (if new items must be created) to much more comprehensively import information to new items, to make them properly identifiable from their statements. Also support delay to see whether new items get moved, merged, added to, or deleted. And above all, please can we encourage creators of new articles to link their own new articles, rather than assume that a bot (might) come and do it for them. Oppose any bot doing this, unless there is such an information campaign on major wikis. Jheald (talk) 09:05, 8 August 2020 (UTC)
    • @Jheald: "encourage creators of new articles to link their own new articles" - but as long as there are users not doing them, there will always be backlog of unconnected pages. Moreover, Many tools (PetScan, HarvestTemplate, projectmerge, etc.) require an existing item to work and thus for the best effect items should be created beforehand.--GZWDer (talk) 09:15, 8 August 2020 (UTC)


  •  Oppose Duplicates will only be found if there are many properties filled on these items. Creating empty duplicates is moving problem from area A to area B, where you can even discuss how big the issue is if some items from the Zulu-wiki (or any other low volume wiki) do not get connected to any wikidata item within a week. This should ALSO be discussed with the community of these wiki's. Edoderoo (talk) 09:18, 8 August 2020 (UTC)
    • @Edoderoo: This also means if a page is never connected, a duplicate is never found. Creating an item will allow using tools projectmerge or KrBot's number merge, or more dumb way, a search; in addition when more data are added, duplicates will surface (User_talk:Ghuron/Archives/2018#Extra_US_presidents). In most wikis, there are no people taking care of unconnected pages; even in most active one like nlwiki, a bot doing so is still required ([27]).--GZWDer (talk) 09:25, 8 August 2020 (UTC)
      • For me it is NOT an issue if page XYZ on Zulu-wiki is not connected to Wikidata and not to any other wiki. And again, you didn't answer on the part: creating EMPTY items will not be any help in finding duplicates. So how many properties will your bot add, and how will it define those properties? I know by own personal experience that a bot adding properties is leading to other HUGE issues and tremendous additional extra manual doublechecking work. Edoderoo (talk) 09:38, 8 August 2020 (UTC)
      • Agree. Duplicates do not exist in the meta-space of WD plus WP. It is either a WD duplicate or not. Before you create the item there is no WD duplicate. --SCIdude (talk) 09:42, 8 August 2020 (UTC)
        • @Edoderoo: In my opinion a hidden duplicate is still a duplicate. This task only covers creating new items; adding statements is another thing. (User:NoclaimsBot is a good idea, but we need to generalize it - it currently only works in a few wikis and there are no workflow suggesting a new template or category to add.) @SCIdude: This basically means we will have a infinitely growing number of unconnected items until there are someone taking care of it (unlikely in smaller wikis), which defeats the purpose of Wikidata to be used as a centralized interlanguage links store.--GZWDer (talk) 09:49, 8 August 2020 (UTC)
          • Creating blank-shitty-items that has no value at all can be done by anyone using PetScan. But if your bot can not add any value, I am absolutely against. You better put your energy in creating value, instead of shitty volume. Edoderoo (talk) 10:09, 8 August 2020 (UTC)
            • Not every pages will be handled by human in due time (see the BotMultichillT example).--GZWDer (talk) 10:21, 8 August 2020 (UTC)
              • @GZWDer: regarding NoclaimsBot. More wiki's can be added and it's easy and anyone can add other templates. Categories produced too many false positives so I didn't implement that part. Multichill (talk) 13:18, 8 August 2020 (UTC)
  • @Mike Peel: I think PiBot is approved for some activities in this kind of area. Any thoughts on best practice, minimum requirements, what other bots are active, and how to function appropriately in this area? Jheald (talk) 09:35, 8 August 2020 (UTC)
    • I'm generally in support of this bot, I come across quite a lot of items that it's usefully created. I think waiting for 2 weeks after page creation is a good idea if the items are going to be blank - pi bot creates them within an hour for humans, but it's also matching pages with existing ones wherever possible, plus it's importing basic data about the humans from enwp at the same time. It would be nice if there was a way of finding matches between unconnected pages and existing items, to avoid duplicates, but this is tricky. I have some scripts that search Wikidata for the page title to find matches, skipping items that already have a sitelink, but they then need a human to check through the matches to see if they are correct. Thanks. Mike Peel (talk) 10:03, 8 August 2020 (UTC)
      • Yeah, though Pi bot is only run on some wikis.--GZWDer (talk) 10:21, 8 August 2020 (UTC)
  •  Oppose for wikis that mass-generate articles automatically like cebwiki (not sure if there are others?). Support for other wikis, with the current age restrictions (14/7). − Pintoch (talk) 10:10, 8 August 2020 (UTC)
    • Another point is, if items are not created at all, few people will notice existence of local pages. Many people imported locations from various sources, where many duplicates with cebwiki ones exists. If duplicates (specifically hidden ones) will happen, let them happen as earliest as possible.--GZWDer (talk) 10:21, 8 August 2020 (UTC)
  • Oppose until a satisfactory explanation is offered about why it is better than the failed proposal Wikidata:Requests for permissions/Bot/GZWDer (flood) 2. – The preceding unsigned comment was added by Jc3s5h (talk • contribs) at 10:53, 8 August 2020 (UTC) (UTC).
    • Duplicates will exist, but creating items will make them visible and allow people actively working on it. In previous years there are significant import from other sources without notice of possible duplicates.--GZWDer (talk) 11:17, 8 August 2020 (UTC)
  • I wrote newitem.py to make sure the backlog doesn't get too large. You to give users the time to connect articles to existing items or create new items with some statements. If that doesn't happen in time, the bot comes along to create an empty item so that we don't get an ever growing backlog. The current settings for the Dutch Wikipedia are: The article has to be at least 49 days old and the last edit has to be at least 35 days ago. Running this bot on 1 days old and edit set to 0 days is ridiculous. Why this rush?  Oppose with those crazy settings. Multichill (talk) 13:18, 8 August 2020 (UTC)
    • @Multichill: What is the propose of a last edit threshold?--GZWDer (talk) 14:20, 8 August 2020 (UTC)
      • If you're going to run this on many wiki's I would do a conservative approach and set creation around 30 days and last edit around 45 days. Multichill (talk) 14:28, 8 August 2020 (UTC)
        • @Multichill: I still do not get the point of last edit. Many local wiki users do not care Wikidata and I can find many articles in DYK, GA or FA without a Wikidata item connected.--GZWDer (talk) 14:33, 8 August 2020 (UTC)
          • To increase that the article is in a stable situation. For example, you don't want to create items for articles that are nominated for deletion. Multichill (talk) 14:39, 8 August 2020 (UTC)
            • @Multichill: FA, GA and DYK articles are usually not "stable" in this sense, as is current hot topic. Instead the bot skip pages containing some defined template.--GZWDer (talk) 15:55, 8 August 2020 (UTC)
  •  Oppose in the same sense as Multichill. I have a page with some tools I use successfully to find matches of particular interest to me. If a duplicate item is created, rather than an available match, then this PetScan query, related to s:Dictionary of National Biography, will not pick it up (within its scope), until it is merged here, which I know can take a year. I use this PetScan query to patrol for likely candidates, and this works well. When an item, for a person, needs to be created, I use https://mix-n-match.toolforge.org/ first, to see whether I can create an item with some identifiers on it, as a start. These workflows have worked fine for me. They have been disrupted by the short lead time: there may be some trade-off here, but I will not be sure about effect of creating the needing-to-be-merged-here items until they start appearing in the medium term.
Where, please, is the urgency of a change in the status quo? Waiting a few weeks is sensible, in the general case. There needs to be a clear explanation of what is currently broken and requiring to be fixed. Charles Matthews (talk) 14:08, 8 August 2020 (UTC)
  • @Charles Matthews: Currently there are no bot at all to clean up the backlog in most wikis. If possible, I can run the enwiki one with the default setting, which will at least significantly reduce the backlog. For other wikis, alternative setting may be used.--GZWDer (talk) 14:22, 8 August 2020 (UTC)
Three points here:
  1. As is usual in discussions with you, you do not answer the question directly, but start another line of discussion. While this may be politeness in some ways, it does not correspond to the needs of wiki culture.
  2. When I read this diff from your user talk page, I thought that you simply don't understand the merging issue for duplicates. Technically, merging is fairly easy on Wikidata. But to do it responsibly, particularly for human (Q5) items but also in other areas such as medical terms, is hard work.
  3. When you ask for a bot permission that gives you many options that you might use, I'm inclined to refuse. I think you should specify what you will do, not talk about what you might do. If you define some backlogs you want to clear, and say how you might clear them, that might be OK.
Charles Matthews (talk) 14:48, 8 August 2020 (UTC)
@Charles Matthews: For this task only, it have only one job - fully automatically creating new items from Wikimedia pages. For point #2: It is not a good thing either that many hidden duplicates in an infinitely growing backlog of unconnected pages that nobody is cleaning up. Things are even worse when many new items are created which increased the number of hidden duplicates. (Mix'n'Match can only find pages in one wiki.) --GZWDer (talk) 15:21, 8 August 2020 (UTC)
Well, I gave a link to a serious dispute on your user talk page. This dispute actually needs to be resolved. It changes the situation. Let me explain: I do not always agree with the idea that bot tasks should be completely specified. It is usually be better if the bot operator agrees to stay within the bot policy. The dispute is about good practice in the creation of duplicates here, which is not currently mentioned in the bot policy.
But the way you are arguing is likely to have the result that not creating too many duplicates is added to the bot policy. Because many people disagree with you. In the end, disputes are resolved by addressing the issues.
A possible solution here is to divide up the language codes into groups, and try to get some agreement on how long to wait for each group of codes. If you can give details of wikis that "nobody is cleaning up", probably that could be a basis for discussion. If you are really saying this is a "long tail" problem, where there is more in the "infinitely growing backlog" of "hidden duplicates", as you call it, than we all know, then we do need to understand how fat the tail is. If there are 250 out of ~300 wikipedias involved, generally the smallest, then maybe it is comprehensible as an issue. The ceb wikipedia is clearly an edge case, and we should exclude it at present. Charles Matthews (talk) 16:14, 8 August 2020 (UTC)
@Charles Matthews: See User:GZWDer_(flood)/Automatic_creation_schedule, currently involving 151 different wikis (including all Wikipedias with more than 10000 articles except four). See also here for the number of unconnected pages older than two weeks per wiki and here for history of backlog in enwiki.
Previously when PetScan is used, pages with title same as the label of existing items are skipped by default. However I don't think this is a good practice as the skipped page are itself infinitely growing. So I decided to import all of them and duplicates can be found when items are created (especially when more statements are added).--GZWDer (talk) 16:45, 8 August 2020 (UTC)
It seems you are missing the point of what I am saying, and also the point that almost everyone here is opposing. Charles Matthews (talk) 18:41, 8 August 2020 (UTC)
@Charles Matthews: Creation of items from unconnected page will results in duplicates unless they are checked one by one which is not possible in a automatic process. Connections do not disappear if the items are not created. So let it happen which will surface works that need to do, unless significant many people doing them in another way (i.e. cleaning up unconnected pages manually).--GZWDer (talk) 18:47, 8 August 2020 (UTC)
To be clear, you need to engage here with criticism. If your attitude is "all or nothing", then clearly at this time you get nothing. Charles Matthews (talk) 19:03, 8 August 2020 (UTC)
@Charles Matthews: Do you agree to automatically create new items for articles older than a time stamp (14 days by default, but may be a bit longer for wikis with users actively handling unconnected pages)? Duplicates will happen nevertheless (there are no automatic way to prevent it), but at least unconnected pages are likely to be abondoned (i.e. not actively handled) if they are not handled in a specific timeframe. In other word, we currently have two workflows (handle unconnect pages before any automatical imports, and handle them after imports), and this proposes a cut point that the loss of creating items lately (i.e. unable to use tools for extent items, and possible duplicates with items recently created) outweight the gain (i.e. duplicates in creation, and premature for human handling).--GZWDer (talk) 13:42, 9 August 2020 (UTC)
@GZWDer: I can agree to a two-phase system, in which (phase I) newly-created wikipedia items are left for a period, and then (phase II) automatic creation of a Wikidata item takes place. In all your suggestions, it seems to me, you make phase I too short. I agree that there is a kind of trade-off here, and that we can accept some duplicates caused in phase II. That doesn't mean that phase II of automated creation has to be blind to the duplication issue. I don't think it is a good idea to apply the same workflow to all wikipedias. (And I would say, as a Wikisource editor, there is much work to do there, also.) Charles Matthews (talk) 13:59, 9 August 2020 (UTC)
@Charles Matthews: Originally I also want to cover Wikisources; as more controversies are expected (and met in the past), The schedule currently only include the Chinese one. So for phase II there are some options:
  1. Create items for older articles en masse, as I originally proposed.
  2. Increase the interval between creations, e.g. Create items once each year, which is what I have done between 2014 and 2020 - this does not solve all issues.
  3. Not creating the items at all. This will result in infinitely growing backlog which I am strongly worried about (even for cebwiki). And also in the future other users will create items covering the same topic without notice of local articles.
  4. Manually checking each article - require language skill and not always scalable
  5. Create them subset by subset (I imported more than 40000 individual English Wikisource articles)
  6. And other ideas?--GZWDer (talk) 14:10, 9 August 2020 (UTC)

@GZWDer: In the big picture, this really is not a simple issue. Here is a table for what seems to be needed.

Plan for: Phase I Phase II Comments
A: smaller wikipedias
B: larger wikipedias
C: other wikis

There is a comments column because: firstly, there are points about scope (deciding about "smaller", "larger" and ceb); secondly, there are issues about item creation in Phase II. Charles Matthews (talk) 08:03, 10 August 2020 (UTC)

 Oppose Looks to me like you want permission to run a new bot which basically just does the same thing your old bot did. It's clear to me that you will not get a go for something like that. If you want permission for a new bot, you will need to build something that addresses our concerns substantially better than your old one.--Hjart (talk) 16:12, 8 August 2020 (UTC)

This is what being planned:

Plan for: Setting Comments
Group 1: All Wikipedias other than those listed below 14/0 (14/7 is also OK but not my favor) Should we split it to larger and smaller ones with different settings>
Group 2: Some Wikipedias such as dawiki 21/0 (or 21/7, 30/7) Wikis with active users handling Wikidata. Alternatively each wiki may use a custom setting.
Group 3: zhwiki, zhwikisource, all Wikinews 1/0 If agreed any Wikidata actions (i.e. improvement and merge) can happen after item creation. The client sitelink widget functions regardless whether an item exist.
nlwiki, cswiki Not to be done
cebwiki Planned to mass import regardless of duplicate, then treat as Group 1 Leaving unconnected indefinitely will result in more and more duplicates
arzwiki Currently skipped, but eventually to be done Currently there are many articles created based on Wikidata and is not connected to Wikidata; it is being fixed
Wikisource (other than zh):
Non-subpages in main namespace
Pages in author namespace
Treated as Group 1 (by default) or 2
Wikisource (other than zh):
Subpages
Manual batch import with a case-by-case basis
  • Wikisource (other than zh) is not planned initially
  • "Pages" includes articles and categories, but non-Wikipedia categories is not planned initially

--GZWDer (talk) 08:39, 10 August 2020 (UTC)

  •  Oppose until previous problems are fixed, e.g. User_talk:GZWDer#Mass_creation_of_items_without_labels. --- Jura 12:15, 11 August 2020 (UTC)
  •  Oppose if the behaviour of the flood bot isn't addressed. Could you please add some heuristic before creating the item, for example check for similarly named items and if there's already something with a 50% similarity leave it in a list to review manually? --Sabas88 (talk) 08:56, 14 August 2020 (UTC)
    • @Sabas88: I am afraid that this will still result in a backlog. For some time when Wikidata item creator is functional (since deprecated in favor of PetScan), it checks and skips any pages with label same as an existing item. After several runs, the list of skipped pages become longer and longer. I do not think it is scalable for any human checking beforehand. Anyway, new pages are held for several days, and users may creating items or linking them to existing one. It is unlikely for a page to be taken care of when there have been a significant period since it is created.--GZWDer (talk) 23:17, 14 August 2020 (UTC)
  • Pictogram voting comment.svg Comment Is there any way we can stem this problem at the source - when a user creates a new page on a language wiki, can we get the UI to immediately try to link it to an existing wikidata item, encourage the user to select the right one? Is there maybe a phabricator task for this? This sort of bot action really can't be the correct long-term solution for this problem. I have run across many, many, maybe over a hundred, such page creations that should have been linked to an obvious existing wikidata item, and required a later item merge on Wikidata. ArthurPSmith (talk) 17:53, 18 August 2020 (UTC)
I like the idea by @ArthurPSmith: very much. @Lydia Pintscher (WMDE): what do you think? Would that be possible to implement? From my point of view, one problem is, that a lot of creators of articles, categories, navigational items, templates, disambiguations, lists, commonscats, etc. are either not aware of the existance of wikidata or did forget to connect a newly created article etc. to an already existing object or to create a new one if not yet existing (which leads to a lot of duplicates, if this creation respectivley connection is not done manually, but by a bot instead, which have to be merged manually). An additional step after saving a newly created article etc. to present to the user a list of wikidata objects (e.g. a list of persons with the same name; could be a similar algorithm as the duplicate check / suggestion list in PetScan, duplicity example 1 and duplicity example 2) that might be matching or the option to create a new one if no one matches. Thanks a lot! --M2k~dewiki (talk) 22:47, 28 August 2020 (UTC)
also ping to @Lucas Werkmeister (WMDE), Mohammed Sadat (WMDE), Lea Lacroix (WMDE): for info --M2k~dewiki (talk) 22:50, 28 August 2020 (UTC)
Also ping to @Lantus, MisterSynergy, Olaf Studt, Bahnmoeller: In addition, i think the User:Pi bot operated by @Mike Peel: does a great job with connecting to existing objects or creating new ones if not existing for items regarding peoples (currently only for the english wikipedia, until june 2020 for about one year also for the german wikipedia, Thanks a lot to Mike! - In my opinion this should be reactived also for the german wikipedia). Of course, the algorithm could be improved, for example by also considering various IDs (like GND, VIAF, LCCN, IMDb, ...). The algorithm is described here: User_talk:Mike_Peel/Archive_2#Matching_existing_wikidata_objects_with_unconnected_articles.

Since this very fundamental problem of connecting articles to existing objects respectivley creating new objects for unconnected pages (when, by whom, how to avoid duplicates, ...) for hundreds of newly created articles per day in different language versions has been discussed for years now, the above proposal by ArthurPSmith could be a solution to it. It might be combined with specialized bots like Mikes Pi bot for people (and maybe others for movies, geographic objects, lists, categories, ...).

Also see

Also for info to @Derzno, Jean-Frédéric, Mfchris84, Giorgio Michele, Ordercrazy: another problem regarding item creation and duplicates is, that there are a lot of already existing entries e.g. for french or german monuments, churches, etc. which contain the monument ID. E.g. for bavarian monuments there are currently 160.000 wikidata objects. But if a user connects an newly created article to an (unconnected) commonscat (using the "add other language" in the left navigation) for this monument an additional wikidata object is created, so there is one object containing the sitelinks to the article and the commonscat and another one with the monument ID. Currently the only solution is to connect a newly created commonscat for a monument as soon as possible to the already existing wikidata object with the monument ID, so if a user connects an article to this commonscat, then the existing wikidata object will be used, otherwise a new one only with the two sitelinks will be created. For example, in 2020 so far about 1.000 new commonscats for bavarian monuments have been created, which have not been connected to the already existing wikidata objects by the creators of the commonscats.

Also see:

Hello @Lantus, Olaf Studt, Bahnmoeller: I have now create these two pages:

The first one might help to find and connect unconnected articles, categories, templates, ... from de-WP to existing objects respectivley to create new wikidata objects. The second one might help to enhance existing objects with IDs (using HarvestTemplates for GND, VIAF, LCCN/LCAuth, IMDb, ...) or other properties (e.g. using PetScan based on categories). Parts of the functionality of these two pages might be sooner or later be implemented in (specialized) bots. --M2k~dewiki (talk) 01:23, 29 August 2020 (UTC)

The problems with Bavaria is slightly complex and a mixture of many different issues. It get started with the bot transfer in 2017 followed by certain others root causes. At the end the datasets are in a very bad shape and it’s a nightmare to clean up. Day by day I’ll find new corners of surprises. Currently I’m working hard to get a couple of these issues fixed. On top of this bot issues we need to find a way to get people stopped, working out by the same way as in the German Wikipedia. Some folks pulling together again and again things to be in line with articles. So the P4244 issue list will be filled up again and again with doubles, wrong using and violations. I have no idea how we can get stopped but personally I’d gave up to discussing with people having no mindset on database design and definitions. Most are living in their own world. Anyhow, crying doesn’t help and I’m doing my best to drop the P4244issues. To be honest this is a job for month and many item needs to be checked manually. So I’m not happy to get through the back door uploaded with new issues of a bot task. --Derzno (talk) 03:31, 29 August 2020 (UTC)
  • As I have said many times: In most wikis, there are not enough people to taken care of unconnected pages. If possible, I can postpond the item creation (the plan is 14 days after article creation), but the backlog must be cleaned eventually.--GZWDer (talk) 04:11, 29 August 2020 (UTC)
@Derzno: the problem is not only related to Bavarian monuments, but affects all cases in all languages and all language versions of wikipedia and all sort of object types (e.g. movies with totally different names in different languages, chemical components, ...), where datasets have been imported before, but not connected to articles, commonscats, etc. How would a user find the right object between the 90 million existing objects (Special:Statistics)? If a user is looking for "Burgstall Humpfldumpf", does not find it and therefore creates a new object for this article/commonscat-combination, while there might exist an object "Bodendenkmal D-123-456-789" or an object for the japanese or russian translation? Duplicates eventually might be identified and merged by identical IDs (like GND, LCCN/LCAuth, VIAF, IAAF, IMDB, monuments IDs like Palissy-ID for french monuments, BLf-ID for Bavarian monuments, DenkXWeb Objektnummer for Hesse state monuments, BLDAM for monuments from Brandenburg, LfDS for monuments from Saxony, P2951 for Austrian monuments, CAS-Number for chemical components, ...). How could the process of the matching by ID (currently there are more than 8.000 properties, a lot of them are IDs) be handled on a large scale (i.e. every day in several language versions of wikipedia hundreds of new articles are created) which need to be connected to maybe already existing objects? --M2k~dewiki (talk) 07:21, 29 August 2020 (UTC)
So we created these items (including duplicates) first, and someone will improve them; duplicates discovered and merged. Originally I expected this will become the primary workflow for unconnected pages - this is why I previously run the bot at 1/0 instead of the default 14/7. There are people who taken care of new Wikipedia articles; Previously my expection is completely move Wikidata handling after item creation. Using a delay is expected by many users, but works (i.e. clearing unconnected pages) should eventually be done. In another word, I give people some time to do Wikidata connection, and after the time limit, new items are created automatically.--GZWDer (talk) 09:05, 29 August 2020 (UTC)
Also see Wikidata:Contact_the_development_team#Connecting_newly_created_articles_to_existing_objects_resp._creating_new_object_-_additional_step_when_creating_articles,_categories,_etc. (difflink). --M2k~dewiki (talk)
  •  Oppose. Moving a backlog from place A to place B makes sense only if place B has a more active community or better tools, but this does not seem to be the case. I often encounter duplicates created by this kind of bots many years ago, and at least on my home wiki some of them might have been spotted earlier if they had simply remained unconnected. --Silvonen (talk) 16:18, 12 September 2020 (UTC)
    • Without a bot of this kind page can left unconnected indefinitely, which may be not optimal for users using PetScan, HarvestTemplate and projectmerge, or even users try to find a item about the topic (they will never know a local unconnected page and it is almost impossible to check each of 900 wikis to find whether a topic exists). If a specific wiki does not have enough people to handle all unconnected pages, we have a reason to mass create them (after a period). (Yes, for wikis with some active users doing so, we can postpond them; but even nlwiki requires a bot to clean up the backlog.)--GZWDer (talk) 10:34, 13 September 2020 (UTC)
  •  Oppose. I'm sceptical in general, but actively hostile to anything run by this user ever since I found a group of hundreds of duplicate items that were so easy to connect to their originals, I managed to do it with Quickstatements. Bots currently have several orders of magnitude more capacity here than active manual editors. With that in mind, running a bot that carelessly adds wrong/duplicate items that require manual correction is wasting the more precious resource. A semi-automated workflow would be preferable, and might have an easier time attracting users if my impression is correct that creating new items is the subjectively more rewarding experience compared to correcting existing items. --Matthias Winkelmann (talk) 00:22, 13 September 2020 (UTC)
    • "new items is the subjectively more rewarding" - yes for the reason I stated above. It requires some work to clean up duplicates, but bring them to Wikidata will allow more users noticing it, especially for wikis with few users handling them locally.--GZWDer (talk) 10:34, 13 September 2020 (UTC)
@Charles Matthews: You have not commented on this plan yet.--GZWDer (talk) 10:35, 13 September 2020 (UTC)
@GZWDer: I commented on 8 August that the urgency of item creation here for newly-created articles on wikipedias is not as great as you are assuming. My view remains the same. Certainly for enWP, which most concerns me, waiting longer and adding more value to items that are created is a good idea. So I will not support a plan of this kind. Charles Matthews (talk) 10:47, 13 September 2020 (UTC)
@Charles Matthews: For wikis with active users handling unconnected pages, it may wait a bit latter. But it is less likely for a page to be connected if they are not connected in a while (for this point a tradeoff must be chosen), and not creating them also impede the usage of many tools (as I responsed to Silvonen).--GZWDer (talk) 11:04, 13 September 2020 (UTC)
@GZWDer: Clearly, there are a number of trade-offs to consider here. But since we don't agree about those trade-offs, we are not so likely to agree on a plan. I am arguing from my actual workflow, starting with PetScan (queries on User:Charles Matthews/Petscan). I become involved in article writing, such as w:Sir James Wright, 1st Baronet, through using queries. Using those queries is positive for my work on enWS and enWP. I think Wikidata is important in integrating Wikimedia projects, so I do not oppose the principle of automated creation of items here. But I do oppose doing it too quickly. Charles Matthews (talk) 11:15, 13 September 2020 (UTC)
  • hmm Wikidata:Requests for permissions/Bot/JonHaraldSøbyWMNO-bot 2 - this is one of the reasons I proposed to mass import pages from Cebuano Wikipedia (and other wikis): others will import something similar, so import them earlier will reduce the number of duplicates.--GZWDer (talk) 14:14, 28 September 2020 (UTC)
  • I didn't read the whole talk but shouldn't it be on Wikipedia side? So after user save his article window with reminder to connect article to Wikidata item should pop up or something similar. Eurohunter (talk) 16:15, 21 December 2020 (UTC)
  •  Oppose Frankly, I am getting a bit tired of all these one sitelink item creations. From an Wikipedia point of view, statements should not be taken from the Wikipedias (and especially not with tools or bots that don't reuse the existing citations) and the length of the backlog does not matter at all. On an priority list on Wikipedia this backlog of unconnected pages is always going to be low down on the list, as it should be.--Snaevar (talk) 18:19, 21 December 2020 (UTC)
@Eurohunter: also see meta:Community Wishlist Survey 2021/Wikidata/Creation of new objects resp. connecting to existing objects while avoiding duplicates. --M2k~dewiki (talk) 18:28, 21 December 2020 (UTC)
@M2k~dewiki: Just wanted to vote but it ended. Eurohunter (talk) 20:05, 21 December 2020 (UTC)
  • In case it wasn't clear earlier, I  Support this bot request. Duplicates are an issue (I frequently merge items created by this bot), so I think it is best if the bot waits for a few days before creating the item, but not running it creates a backlog of unconnected items that gets in the way of matching new items. Pi bot also now imports various statements (such as commons category links and descriptions, hopefully coordinates soon) for non-humans, but only if the item already exists - and again, not having the Wikidata item creates backlogs for those tasks. @GZWDer: I know you don't like it, but could you adopt the '14/7' rule please, and clear the backlog? Thanks. Mike Peel (talk) 19:11, 28 December 2020 (UTC)
  • So:
Plan for: Setting Comments
Default: All Wikipedia, and Wikisource non sub-pages 14/0 or 14/7 -
Some specific wikis (please comment below) TBD
All Wikinews 1/0 If approved, will succeed Wikidata:Requests for permissions/Bot/RegularBot 3
nlwiki, cswiki Not to be done
cebwiki Items will be created with at least one identifier (or source) other than Geonames. The actual code is to be developed.
arzwiki Currently skipped Will be re-evaluated if bot-creating article is stopped
  • @Jheald, Edoderoo, Pintoch, Jc3s5h, Charles Matthews, Hjart: Please comment, if you want a different configuration, either in general, or in a specific wiki.--GZWDer (talk) 19:26, 28 December 2020 (UTC)
    @GZWDer: Being pragmatic (what has a chance to be be approved?), I suggest that you just look at Wikipedias for this task, go with 14/7 with a list of excluded Wikipedias, and leave the rest for other bot tasks. Thanks. Mike Peel (talk) 19:33, 28 December 2020 (UTC)
    @GZWDer: I've said it before, but since you don't seem to understand it, I guess it needs to be said again.. You need to actively ask every single wikipedia for permission before running any bots on them. Danish wikipedia i.e. has had people handling unconnected pages for years and I guess many other wikipedias has too. At least, don't touch dawiki. Thanks --Hjart (talk) 22:27, 28 December 2020 (UTC)
  • @Hjart: Do your community run a bot that cleans up very old backlog? If no I will run it on 30-day old pages. P.S. You did not responsed to my comment at Wikidata:Requests_for_permissions/Bot/RegularBot 3.--GZWDer (talk) 22:31, 28 December 2020 (UTC)
    @GZWDer: Yes. we do have such a bot. And from watching some german activity, I guess they do too. Again, please ask every single community before doing anything to their backlogs. And don't touch dawiki at all. --Hjart (talk) 22:38, 28 December 2020 (UTC)
    OK. --GZWDer (talk) 22:39, 28 December 2020 (UTC)
  • I still oppose this, as I am not confident the operator can respect the views of the community on this. General lack of trust in them given the history in this area. If this task is important, someone else will step in to do it, no one is (or should be) irreplaceable. − Pintoch (talk) 21:43, 30 December 2020 (UTC)
  • There clearly isn't yet a meeting of minds here. Charles Matthews (talk) 11:10, 2 January 2021 (UTC)

RegularBot[edit]

RegularBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: GZWDer (talkcontribslogs)

Task/s: Doing fully-automatic and periodic tasks (see below)

Code: See /data/project/largedatasetbot-regular in Toolforge

Function details:

Current tasks are:

Potentially future tasks:

Before the task moved to a dedicated bot account, they are performed via the GZWDer (flood) account. But this needs to be moved to a new account to cope with phab:T258354. (The request is not ready to be approved until we decided how to use the user group.) --GZWDer (talk) 13:24, 4 August 2020 (UTC)

 Support Definitely preferable to run this under a bot account, thanks! ArthurPSmith (talk) 18:37, 4 August 2020 (UTC)
  •  Oppose until problems from previous runs of the operator's bots are fixed, e.g. User_talk:GZWDer#Mass_creation_of_items_without_labels. --- Jura 12:16, 11 August 2020 (UTC)
    • @GZWDer: I don't think it's adequate that the response to problems raised on the operators talk page is limited to noting that the same tool wont be used again and another bot will clean it up (what hasn't happened in four months). Can you present a plan to investiage previous problems with your bots and a way to track their resolution? I don't expect you to fix them all yourself (you can place requests on Wikidata:Bot_requests), but similar problems need to be identified and you need to ensure they don't reoccur. It's not ok that you bork items for Wikisource (leave it to others to clean up) and then years later you do the same again. --- Jura 05:59, 12 August 2020 (UTC)
      • @Jura1: Do you find any examples that are not fixed?--GZWDer (talk) 14:40, 12 August 2020 (UTC)
        • I had found 2000. I don't think think it's for Matej or myself to fix or check if identified problems are fixed or not. It's really up to you to do that. Can you do that and come back? --- Jura 06:16, 13 August 2020 (UTC)
          • @Jura1: For example?--GZWDer (talk) 08:20, 13 August 2020 (UTC)
            • Sample for what? --- Jura 09:01, 13 August 2020 (UTC)
              • @Jura1: Do you find any items that have such issue and are not fixed?--GZWDer (talk) 09:04, 13 August 2020 (UTC)
                • The comment linked above pointed to 2000 of them. --- Jura 09:12, 13 August 2020 (UTC)
                • @Jura1: But they are fixed.--GZWDer (talk) 09:18, 13 August 2020 (UTC)
                  • The question for you is if your bot(s)/account(s) created more simiarly defective items and if they all have been fixed since. Further, if all other defects raised to you have been followed up. --- Jura 09:22, 13 August 2020 (UTC)
                    • I don't think so. Feel free to point to an example if it is not the case.--GZWDer (talk) 09:24, 13 August 2020 (UTC)
                      • The problem with the label of Q75877437 raised in 2019 is still unresolved (and probably thousands of similar ones). --- Jura 09:30, 13 August 2020 (UTC)
                        • @Jura1: See https://w.wiki/ZaU --GZWDer (talk) 23:37, 14 August 2020 (UTC)
                        • Ty. When I wrote that comment in 2019 I thought it was helpful to include an entire regex of cases that needed fixing. I fixed some, others fixed more, but there is still left. Maybe we should add it to Wikidata:Bot_requests. --- Jura 17:02, 15 August 2020 (UTC)
  • Comment If this request is approved (on which I am giving no opinion), it should only be on the firm condition that the bot creates no new items under this task -- i.e. any job that might involve creating items would need to be submitted as a new bot request, for separate discussion, and should not go ahead under this approval. Any such new bot request would need to set out in detail for consideration what actions were being proposed to avoid the creation of duplicates, and how the new items would be properly populated with enough statements to make them well identifiable. Given GZWDer's previous tendency to be rather "relaxed" on both these scores in the past (at least in the eyes of many), I believe this limitation, and requirement in future of specific approval before any such tasks, to be necessary. Jheald (talk) 13:59, 12 August 2020 (UTC)
    • This task will only create items from Prime Page, I hope there will be no duplicates.--GZWDer (talk) 14:39, 12 August 2020 (UTC)

Orcbot[edit]

Orcbot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: EvaSeidlmayer (talkcontribslogs)

Task/s: The bot makes use of author-publication-matches from ORCID database to match existing publication-items and author-items in Wikidata.

Code: https://github.com/EvaSeidlmayer/orcid-for-wikidata

Function details: The bot aims for matching authors-items and publication-items on the ground of ORCID database. Only already in Wikidata existing authors-items and already in Wikidata existing publications-items are matched.

ORCID contains in 2019 eleven archive files. For the first archive file we had been able to detected:

  • 457K Wikidata publication-items (3.8M publications in total)
  • 425K publication-items do not have any author-item registered
  • 32K publications are identified in Wikidata with registered authors
  • of those 32K publication-items:
    • 3.7K author-items listed in Wikidata are correct allocated to their publication-items (11.7%)
    • 4.2K author-items listed in Wikidata are not yet allocated to publication-items (24.6%)
    • The other authors are not registered to Wikidata yet.

These are the numbers only for the *first* of *eleven* ORCID-files. Would be cool to introduce the matching of authors to publications on ORCID basis.

  • @EvaSeidlmayer: Thanks for working on this. One thing I don't see in your github README or your statement here is how you plan to match up the authors with the existing author name string (P2093) entries for these articles - or is the plan just to add the author (P50) entries with no qualifiers and not removing the existing name strings? Matching name strings is quite tricky, especially as given names are often abbreviated, some parts of names may be left out, joined together in different ways, etc. Not to mention name changes... And there can be two authors on the same paper with the same surname, or partially matching surnames. These issues have tripped up a number of automated approaches here in the past. There are also issues with duplicate or otherwise erroneous ORCID records, which have also tripped things up - for example there have been some major imports of this sort of author data from Europe PMC which, when there are duplicate ORCID's, lists both, resulting in an offset for all the author numbers (series ordinal (P1545) qualifiers) after that point. Anyway, this is definitely useful, but can be harder than it seems. ArthurPSmith (talk) 17:50, 30 July 2020 (UTC)
  • @EvaSeidlmayer, ArthurPSmith:  Support. It is accurate that the bot task is harder than described. However, it is important to begin even if the bot is not complete. Effectively, she can further develop later the bot's source code. Just a brief note. I advise Ms. Eva to create a user page for the bot on Meta. --Csisc (talk) 18:25, 30 July 2020 (UTC)
  • @Csisc: I created a user page on Meta. However, now I dont see how I can create/retrieve a API-token for the bot. Is there a documentation?
Please also make some test edits.--Ymblanter (talk) 19:15, 12 August 2020 (UTC)
  • @Ymblanter: I tried to make some test edits in the test Wikidata instance taking into account the different properties numbers as well. However, I was told I do not have the bot right for pushing to the test instance. Where can I get the bot right for the test instance? --Eva (talk) 16:03, 26 August 2020 (CET)
    Sorry, I am not sure I understand the question. Can you make about 50 test edits here? You do not need the bot flag for the test edits.--Ymblanter (talk) 18:58, 26 August 2020 (UTC)
    @Ymblanter: Hm. Strange. I checked again, the instance I refer to is the test instance: "wb config instance / https://test.wikidata.org/w/api.php" But when I push only *one* file (such as "wb create-entity Q123.json") I get: "{ assertbotfailed: assertbotfailed: You do not have the "bot" right, so the action could not be completed...." What did I do wrong? --Eva (talk) 09:44, 27 August 2020 (CET)
    Unfortunately, I do not know. You may want to ask at a better watched place such as the Project Chat--Ymblanter (talk) 18:27, 27 August 2020 (UTC)
  • I managed to do some test edits with Orcbot in the test instance. In order to connect them with Orcbot subsequently by adding author statements to article items P242, I created some scientific article items and authors item manually.
    • authors:
      • Josepha Barrio Q212734
      • Shuai Chen Q212749
      • Raphael de A da Silva Q212755
    • articles:
      • Prevalence of Functional Gastrointestinal Disorders in Children and Adolescents in the Mediterranean Region of Europe. Q212738
      • Dietary Saccharomyces cerevisiae Cell Wall Extract Supplementation Alleviates Oxidative Stress and Modulates Serum Amino Acids Profiles in Weaned Piglets Q212750
      • Amino-acid transporters in T-cell activation and differentiation. Q212751
      • Dietary L-glutamine supplementation modulates microbial community and activates innate immunity in the mouse intestine. Q212752
      • Insight in bipolar disorder: a comparison between mania, depression and euthymia using the Insight Scale for Affective Disorders. Q212753
      • Changes in absolute theta power in bipolar patients during a saccadic attention task. Q212754


The article now have an author statement what was missing before. The template for the connection looks like this: {"id": "Q212754", "claims": {"P242": {"value": "Q212755", "qualifier": [{"P80807": "('Rafael', 'de Assis da Silva')"}]}}}

@Csisc, Ymblanter: What is the next step to establish the Orcbot? --Eva (talk) 14:04, 1. September 2020 (CET)

Could you please do a few edits here (they may be the same as on test wikidata if appropriate).--Ymblanter (talk) 20:06, 1 September 2020 (UTC)
@EvaSeidlmayer: Can you write down the message in red issued by the compiler. --Csisc (talk) 09:49, 2 September 2020 (UTC)
@Csisc: Not sure if this is the message expected, but this is what I get when I try to log in after I reset the credentials to: "invalid json response body at http://www.wikidata.org/w/api.php?action=login&format=json reason: Unexpected token < in JSON at position 0" This is the red part. However, first I am asked to "use a BotPassword instead of giving this tool your main password". --Eva (talk) 14:02, 2. September 2020 (CET)
@EvaSeidlmayer: Try to use requests.post instead. See https://www.wikidata.org/w/api.php?action=help&modules=login for login documentation. --Csisc (talk) 14:17, 3 September 2020 (UTC)
Hey @Csisc:, when I'm logged in as EvaSeidlmayer@Orcbot using abc1def2ghi3jkl4mno5pqr6stuv7wxyz as password I receive this message: "permissiondenied: You do not have the permissions needed to carry out this action." I use Wikidata-CLI for the interaction. --Eva (talk) 22:43, 4. September 2020 (CE
@EvaSeidlmayer: Try to use Orcbot as a username (just the bot username). You can also change to Wikidata Integrator (https://pypi.org/project/wikidataintegrator/). --Csisc (talk) 11:35, 8 September 2020 (UTC)
It worked after I updated the bot password including "edit existing pages". :) Afterwards, I was able to do the test edits:

The authors are now registered (P50) to their publications:

Q48080592 Changes in absolute... → Q47701823 Raphael de A da Silva
Q40249319 Insight in bipolar... → Q47701823 Raphael de A da Silva
Q43415493 The complete picture of changing pediatric inflammatory... → Q85231573 Josefa Barrio
Q37721105 Dietary Saccharomyces cerevisiae... → Q61824599 Shuai Chen 
Q41082700 Amino-acid transporters..  → Q61824599 Shuai Chen
Q51428341 Dietary L-glutamine supplementation.. → Q61824599 Shuai Chen

@Csisc:, sorry it took so much time! --Eva (talk) 09:27, 9. September 2020 (CET)

@EvaSeidlmayer: This is an honour for me. --Csisc (talk) 15:00, 9 September 2020 (UTC)

What is the next step to get this approved? NMaia (talk) 13:28, 24 November 2020 (UTC)

I still do not see test edits--Ymblanter (talk) 19:56, 25 November 2020 (UTC)
@EvaSeidlmayer: Did you make the test edits by running the bot script with your account, e.g. this edit to add an author? I notice that you didn't add stated as (P1932) or series ordinal (P1545) qualifiers and the author name string (P2093) claim for the same author was not removed. Will Orcbot make these edits when importing data?
Presuming Orcbot is going to add stated as qualifiers, will the name formatting be consistent with an item's existing author and author name string statements? Since the large imports of scholarly article (Q13442814) bibliographic data were, to the best of my knowledge, primarily from PubMed and CrossRef, there is a risk that using a different source (i.e. ORCID) could result in inconsistent data, such as a combination of initialised and full given names. It won't be an issue when adding authors to new publication items created by Orcbot. But it might be preferable to handle existing items differently and copy data from the existing author name string to a new author claim. Simon Cobb (User:Sic19 ; talk page) 01:08, 8 January 2021 (UTC)
Hey @Sic19:, thank you for thinking along! Regarding the problem of potential different "name formatting" from PubMed, CrossRef and ORCID, the OrcBot requests all labels and aliases for an author QID (which is supposed to be registered as author (P50) to an article (reference ORCID public data file)). OrcBot uses the following command for doing this:
wb d author_QID  | jq -r '.labels,(.aliases|.[])|.[].value' | sort | uniq 

Then, OrcBot compares all of these spellings with the names stated in author name string (P2093). By this means, OrcBot makes sure that the series ordinal from author name string (P2093) can be transferred correctly to author (P50). Does this solve your objection? Did I understand you correctly? Eva (User:EvaSeidlmayer ; talk page) 19:07, 14 January 2021 (UTC)


Hey, sorry for late response. Yes, OrcBot runs as EvaSeidlmayer. I can change this if this is necessary. When Rdmpage pointed out the lack of series ordinal (P1545) and author name string (P2093). I stopped OrcBot (in November 2020). I am currently on the improvement of OrcBot which involves the transfer of information (series ordinal) from author name string (P2093) to author (P50). Afterwards, the author name string (P2093) statement will be deletetd as some tools cannot deal with both statements (author (P50), author name string (P2093)) at the same time. Eva (User:EvaSeidlmayer ; talk page) 18:44, 14 January 2021 (UTC)

OpenCitations Bot[edit]

OpenCitations Bot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Csisc (talkcontribslogs)

Task/s: Adding references and citation data for scholarly publications found in Wikidata using Wikidata tools and OpenCitations.

Code: Not developed

Function details:

  • This bot retrieves the Wikidata ID and DOI of scholarly publications using WDumper. Then, it uses the REST API of OpenCitations to retrieve the DOI of the references and citing works of each publication. Finally, the obtained DOI are converted to Wikidata ID using the WDumper-based dump and the final output is automatically added to Wikidata using QuickStatements API as cites work (P2860) relations.
  • The License of OpenCitations is CC0.

--Csisc (talk) 13:23, 29 July 2020 (UTC)

We had another bot doing this work a while ago, is it no longer operational? Or was there a reason it stopped? Also, since each article usually has a dozen or more references, sometimes many times that, it would be better to add the references in a single update, rather than one at a time as would be necessary through QuickStatements. Other than that, yes it would be good to get this added to Wikidata. ArthurPSmith (talk) 17:51, 29 July 2020 (UTC)
@ArthurPSmith: What I found out is that there are many publications in Wikidata not linked to their reference publications although reference data about them are available in OpenCitations. I can restrict the work to scholarly publications not having any reference using SPARQL. --Csisc (talk) 09:34, 30 July 2020 (UTC)
@ArthurPSmith: We had User:Citationgraph bot and User:Citationgraph bot 2 work on this. Both stopped operating in 2018, since their operator, User:Harej, had rearranged his priorities. Yes, it would make sense to add all cites work (P2860) statements for an item in one go, e.g. via Wikidata Integrator. Not sure how the bot should handle citations of things for which Wikidata does not have an entry yet — perhaps with "no value" and "stated as", so that the information can be converted later as needed. --Daniel Mietchen (talk) 09:02, 7 September 2020 (UTC)
@ArthurPSmith: @Daniel Mietchen: Just wanted to mention that Citationgraph bot seems to be back online and Citationgraph bot 2 would follow along soon. I wish we had some something like Scroll To Text Fragment widely supported as web standard, or some paragraph-based anchoring, so I could point to the exact paragraph in this long thread (look for "Harej" there instead). --Diegodlh (talk) 22:05, 15 February 2021 (UTC)
@Csisc: Hi! I understand this was part of the Wikicite grant proposal you presented last year. I'm sorry it wasn't approved. Do you plan developing the bot anyway? Now that Elsevier has made their citations open in Crossref, I understand COCI coverage will see a dramatic increase next time it is published (last time was 07 Dec 20, before Elsevier's announcement). Thank you! --Diegodlh (talk) 04:53, 28 January 2021 (UTC)
Diegodlh: Of course, I am still for developing the bot. However, we need a server to host it. If the bot can be hosted, I do not mind developing it. The acceptance of Elsevier to include its citation data in OpenCitations corpus will certainly allow a trustworthy coverage of citation data in Wikidata graph. --Csisc (talk) 12:55, 31 January 2021 (UTC)
Hi, @Csisc:! Thanks for answering. Sorry I'm relatively new in this. Cannot it be hosted in Toolforge? --Diegodlh (talk) 18:49, 1 February 2021 (UTC)
@Diegodlh: I am studying this. The matter with Toolforge is that the Cloud can be easily blocked. --Csisc (talk) 12:25, 2 February 2021 (UTC)
  • I am very excited about this project. The reason my old bot shut down was, among other factors, the scaling issues. I was no longer able to get a reliable mapping of Wikidata items and DOIs from the Wikidata Query Service. The use of WDumper addresses that nicely. For data sources I also recommend PubMed Central. Harej (talk) 21:40, 9 September 2020 (UTC)
Please develop the code and make some test edits.--Ymblanter (talk) 19:30, 10 September 2020 (UTC)
  • Just as an observation, I have been trying to produce a dump of DOIs on Wikidata, and the task has yet to complete after seven days and as of writing is going to take months to complete. However I am developing an alternative strategy for producing lists of identifiers and hope to share more later. Harej (talk) 22:51, 6 October 2020 (UTC)
    • I have generated a dataset of Wikidata items with DOIs as of the 20 August 2020 dump. This should definitely help you get started. Harej (talk) 21:51, 7 October 2020 (UTC)
Ymblanter, Harej: I thank you for your answer. I will consider your comments and develop the bot for several months. --Csisc (talk) 12:55, 31 January 2021 (UTC)
Great, I will be looking forward.--Ymblanter (talk) 20:09, 31 January 2021 (UTC)

TwPoliticiansBot[edit]

TwPoliticiansBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Jd3main (talkcontribslogs)

Task/s: Import data of politicians in Taiwan.

Code: We are still working on the code. The GitHub link will be added soon.

Function details: We plan to crawl data from the database of the Central Election Commission (link). When there are potential errors or duplications, this bot might skip these data and report them to the operator. --TwPoliticiansBot (talk) 14:31, 12 July 2020 (UTC)

T cleanup bot[edit]

T cleanup bot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Jura1 (talkcontribslogs)

Task/s: cleanup leftover from last incident, once discussion closed

Code:

Function details: Help cleanup as needed ----- Jura 17:39, 21 June 2020 (UTC)

Non-admins can not delete items.--GZWDer (talk) 20:02, 21 June 2020 (UTC)
@Jura1: Is this request still active? If so, please provide a permanent link to the relevant discussion. Hazard-SJ (talk) 06:06, 7 October 2020 (UTC)
  • I think some cleanup is still needed. I keep coming across duplicates. Supposedly beyond the identified ones, there are plenty more. Given that @Epìdosis: wants to work with them, we might as well keep them. It seems that most other users don't care or filter them out as well. --- Jura 18:57, 7 December 2020 (UTC)
    • Duplicates still need to be merged in big numbers (thousands), but this needs human check, as discussed. I cannot provide much more help, but I wouldn't delete anything, as the items are well sourced and, despite duplication, contain valuable information. In fact, it's not the sole import of items being well sourced but also having a high percentage of duplicates, and unfortunately the only way to act in these cases is trying to merge (mostly manually) duplicates. --Epìdosis 21:00, 7 December 2020 (UTC)
      • If you want to keep them and clean them up, let's close this. Just bear in mind that what you might consider well sourced can be a wiki or tertiary source, with eventually similar problems as Wikipedia in general. --- Jura 21:22, 7 December 2020 (UTC)

OlafJanssenBot[edit]

OlafJanssenBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: OlafJanssen (talkcontribslogs)

Function: Replace dead, outdated or non-persitent links to websites of the KB (national library of The Netherlands) in Wikidata with up-to-date and/or persistent URLs

Code: https://github.com/KBNLwikimedia/WikimediaKBURLReplacement and https://github.com/KBNLwikimedia/WikimediaKBURLReplacement/tree/master/ScriptsMerlijnVanDeen/scripts,

Function details: This article explains what the bot currently does on Dutch Wikipedia (bot edits on WP:NL listed here. I want to be able to do the same URL replacements in Wikidata, for which I'm requesting this bot flag. The bot flag for this type of task is already enabled on Dutch Wikipedia, see here for approval

--OlafJanssen (talk) 21:45, 11 June 2020 (UTC)

I will approve this task in a couple of days, provided that no objections will be raised. Lymantria (talk) 09:45, 20 June 2020 (UTC)

@Lymantria, OlafJanssen:

  • I don't really see it do useful edits. It's somewhat pointless to edit Listeria lists ([28], etc) and one should avoid to edit archive pages [29][30]. --- Jura 10:16, 24 June 2020 (UTC)
  • Discussion should take place at User talk:OlafJanssen. Lymantria (talk) 10:21, 24 June 2020 (UTC)
  • I think it should be un-approved. Shall I make a formal request? --- Jura 10:26, 24 June 2020 (UTC)
    • No. Let's reopen this discussion. Lymantria (talk) 08:07, 26 June 2020 (UTC)

Recipe Bot[edit]

Recipe Bot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: U+1F360 (talkcontribslogs)

Task/s: Crawl https://www.allrecipes.com and insert items into wikidata with strctured data (ingredients and nutrition information).

Code: TBD. I haven't written the bot yet. I would like to get feedback before doing so.

Function details:

  • Crawl https://www.allrecipes.com and retrieve the structured data (example) for a recipe.
  • Parse the list of ingredients and nutrition information, Halt if any items are not parsed cleanly.
  • See if a Wikidata item already exists (unlikely, but a good saftey check)
  • Create an item for the recipe with the title, structured information (ingredients and nutrition infomration), and URL to the full work.

--U+1F360 (talk) 14:21, 20 May 2020 (UTC)

admittedly I'm not familiar with WD's bot policy but this does not seem useful as creating empty items would be useless and there is no place on any project where we should be mass posting recipes. Praxidicae (talk) 13:05, 22 May 2020 (UTC)
@Praxidicae: The items wouldn't be empty, they would contain metadata about the recipes. It would allow users to query recipes based on the ingredents, nutrition information, cook time, etc. U+1F360 (talk) 13:42, 22 May 2020 (UTC)
When I meant empty, I meant to other projects. Wikidata shouldn't serve as a cookbook. This is basically creating a problem that doesn't exist. Praxidicae (talk) 13:44, 22 May 2020 (UTC)
Who says Wikidata shouldn't serve as a cookbook? I would love to query Wikidata for recipes using e.g. the ingredients I have at home. --Haansn08 (talk) 09:21, 27 September 2020 (UTC)
@Praxidicae: Most items on Wikidata do no have any sitelinks. I asked in the project chat if adding recipes was acceptable, and at least on a conceptual level that seems fine? I believe it would meet point #2 under Wikidata:Notability. I'm not sure how it's any different from Wikidata:WikiProject_sum_of_all_paintings. U+1F360 (talk) 13:52, 22 May 2020 (UTC)
I fundamentally disagree I guess. This is effectively using Wikidata as a mini project imo. Praxidicae (talk) 13:53, 22 May 2020 (UTC)
I feel like the ship has sailed on that question (unless I'm missing something). U+1F360 (talk) 13:55, 22 May 2020 (UTC)
It wasn't a question, it's me registering my objection to this request. Which I assume is allowed...Praxidicae (talk) 13:57, 22 May 2020 (UTC)
@Praxidicae: Of course it is. :) I guess my point is that, the "problem" is that our coverage of recipes is basically non-existent. I'd like to create a bot to expand that coverage. A recipe is a valuable creative work. Of course I don't expect people to write articles about recipes (seems rather silly). In the same way, we are adding every (notable) song to Wikidata... that's a lot of music. U+1F360 (talk) 14:01, 22 May 2020 (UTC)
Which is what I find problematic. There have been proposals in the past to start a recipe based project and they have been rejected each time by the community. This is effectively circumventing that consensus. Not to mention this already exists and I also have concerns about attribution when wholesale copying from allrecipes. Praxidicae (talk) 14:03, 22 May 2020 (UTC)
What about the copyright side? Their Terms of Use specifies that the copyrights are held by the copyright owners (users) and there is no indication of free license in the website. Recipes are not mere facts, numbers and/or IDs. Also, there is no indication of "why Wikidata needs this info". — regards, Revi 14:18, 22 May 2020 (UTC)
I'll kick myself for asking, but U+1F360, sell this to me. Explain the copyright details, explain the instructional sections, explain how alternative ingredients will work, explain how differences in measurement units in different countries will work. This is your opportunity. Sell it to all of us. Nick (talk) 15:01, 22 May 2020 (UTC)
Let me attempt to answer "all" the qustions. :) For some background, I was recently trying to find recipes based on the ingredients I have on hand. Sure, you can do a full-text search on Google, but if you have 2 potato (Q16587531), it doesn't tell you if the recipes require 2 or less potato (Q16587531), just that it mentions the word. :/ Also, not to mention all the other ingredients you may need that you may not have (especially during a global pandemic). I was looking for just a database of recipes (not the recipes themselves), and as far as I could find, that doesn't exist (at least not in a structured form). I also thought of many other questions which are difficult (if not impossible) to answer without such a dataset like: What is the most common ingredient in English-langauge recipes? What percentage of recipes are vegitarian? Quesitons like this are un-answerable without a dataset of known recipes. As far as copyright is concerned, according to the US Copyright Office:

A mere listing of ingredients is not protected under copyright law. However, where a recipe or formula is accompanied by substantial literary expression in the form of an explanation or directions, or when there is a collection of recipes as in a cookbook, there may be a basis for copyright protection. Note that if you have secret ingredients to a recipe that you do not wish to be revealed, you should not submit your recipe for registration, because applications and deposit copies are public records. See Circular 33, Works Not Protected by Copyright.

At least in the United States, the "metadata" about a recipe (ingredients, nutirition information, cook time, etc.) cannot be copyrighted and therefore exists in the public domain. Since it's unclear whether the directions on a recipe are under copyright or not, I think it's safest to leave all directions in the source. As an example. Let's say we have a cookbook like How to Cook Everything (Q5918527) should we not catelog every recipe from the book in Wikidata? I would think this would be valuable information no? In my mind this is the same difference as an album like Ghosts V: Together (Q88691681) which has a list of tracks like: Letting Go While Holding On (Q93522041). I am not suggesting that we create a wiki of freely licened recipes. As @Praxidicae: mentioned that has been proposed and rejected many times. This is the same thing as music albums with songs or tv shows with epsides. Now we could make up a threshold of notability for recipes. Does it need to be printed in a book? Does it need at lest 3 reviews if on allrecipes? I'm not sure what makes a recipe notable or not, but in my mind they are valuable works for art that should be cataloged. U+1F360 (talk) 17:05, 22 May 2020 (UTC)
I realized I missed a few questions in there. Alternative ingredients should be marked with a qualifier of some kind. Measurments should remain in whatever unit is in the referenced source (as we do with all other quantities on Wikidata). The measurments could be converted when a query is preformed or a recipe is retried. U+1F360 (talk) 17:51, 23 May 2020 (UTC)
I manually created a little example Oatmeal or Other Hot Cereal (Q95245657) from a cookbook that I own. Open to suggetions on the data model! U+1F360 (talk) 23:02, 23 May 2020 (UTC)
Here is another example: Chef John's Buttermilk Biscuits (Q95382239). Please let me know what you think and what should change (if anything). U+1F360 (talk) 17:47, 24 May 2020 (UTC)
I like the idea of having recipes in Wikidata. The examples show we need more properties/qualifiers to better describe recipes. --Haansn08 (talk) 09:40, 27 September 2020 (UTC)

LouisLimnavongBot[edit]

LouisLimnavongBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: LouisLimnavong (talkcontribslogs)

Task/s: Bot to get birthplace and nationality for a list of artists.

Code: import pywikibot

site = pywikibot.Site("en", "wikipedia") page = pywikibot.Page(site, "Khalid") item = pywikibot.ItemPage.fromPage(page)

Function details: --LouisLimnavong (talk) 13:08, 14 May 2020 (UTC)

@LouisLimnavong: It looks like creating this request was your only edit across all Wikimedia projects (and was over 5 months ago). If this request is still valid, please clarify what the task would be. Hazard-SJ (talk) 06:51, 3 November 2020 (UTC)

BsivkoBot[edit]

BsivkoBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Bsivko (talkcontribslogs)

Task/s:

  • Term "видеоигра" is not correct for common description of "video game" for russian language. We can see it for ruwiki in Q7889, root (and childs) category and other cases. However, despite language differences, the first term was imported in bulk by bot(s?) (example). To fix this mistake we can use the same bot approach. And simultaneously, fill up description for empty pages.

Example of the first case and for the second one.

Code:

  • I use pywikibot, and there is a function which checks the presence of mistake and fix it. For empty cases, it prepares a short description:

def process_wikidata_computergame(title):

   item = get_wikidata_item("ru", title)
   if not item:
       return
   if 'ru' in item.descriptions.keys():
       if "видеоигра" in item.descriptions['ru']:
           item.descriptions['ru'] = item.descriptions['ru'].replace("видеоигра", "компьютерная игра")
           item.editDescriptions(descriptions=item.descriptions,
                                 summary=u'"компьютерная игра" is a common term for "videogame" in Russian')
   else:
       if 'P31' in item.claims.keys():
           if item.claims['P31'][0]:
               if item.claims['P31'][0].target:
                   if item.claims['P31'][0].target.id:
                       if item.claims['P31'][0].target.id == 'Q7889':
                           item.descriptions['ru'] = item.descriptions['ru'] = "компьютерная игра"
                           item.editDescriptions(descriptions=item.descriptions,
                                                 summary=u'added Russian description')
   pass


Function details: --Bsivko (talk) 13:25, 8 May 2020 (UTC)

  • The bot works in background with other articles, and it doesn't have broad scan. Bsivko (talk) 13:25, 8 May 2020 (UTC)

BsivkoBot[edit]

BsivkoBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Bsivko (talkcontribslogs)

Task/s:

Code:

  • I use pywikibot, and there is a piece of software which gets property, makes a request with URL, gets page text, recognize article absence and switch to deprecated if keywords of absence were found:
   def url_checking(title, page):
   try:
       item = pywikibot.ItemPage.fromPage(page)
   except pywikibot.exceptions.NoPage:
       return
   if item:
       item.get()
   else:
       return
   if not item.claims:
       return
   id_macros = "##ID##"
   cfg = [
       {
           'property': 'P2924',
           'url': 'https://bigenc.ru/text/' + id_macros,
           'empty_string': 'Здесь скоро появится статья',
           'message': 'Article in Great Russian Encyclopedia is absent'
       },
       {
           'property': 'P4342',
           'url': 'https://snl.no/' + id_macros,
           'empty_string': 'Fant ikke artikkelen',
           'message': 'Article in Store norske leksikon is absent'
       },
       {
           'property': 'P6081',
           'url': 'https://ria.ru/spravka/00000000/' + id_macros + '.html',
           'empty_string': 'Такой страницы нет на ria.ru',
           'message': 'Article in RIA Novosti is absent'
       },
   ]
   for single in cfg:
       if single['property'] in item.claims:
           for claim in item.claims[single['property']]:
               rank = claim.getRank()
               if rank == 'deprecated':
                   continue
               value = claim.getTarget()
               url = single['url'].replace(id_macros, value)
               print("url:" + url)
               r = requests.get(url=url)
               print("r.status_code:" + str(r.status_code))
               if r.status_code == 200:
                   if single['empty_string'] in r.text:
                       claim.changeRank('deprecated',
                                        summary=single['message'] + " (URL: '" + url + "').")
               pass
   pass


Function details:

  • The bot works in background with processing other articles in ruwiki. So, that doesn't have broad scan. Also, there's not so many bad URL, and therefore, the activity is low (a few contribs per month). Bsivko (talk) 12:49, 8 May 2020 (UTC)

DeepsagedBot 1[edit]

DeepsagedBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Deepsaged (talkcontribslogs)

Task/s: import russian lexeme and sense from ru.wiktionary.org

Code:

Function details: --Deepsaged (talk) 06:16, 14 April 2020 (UTC)

@Deepsaged: please make some test edits--Ymblanter (talk) 19:06, 14 April 2020 (UTC)
@Ymblanter: done: создать (L297630), сотворить (L301247), небо (L301348) DeepsagedBot (talk) 17:26, 28 May 2020 (UTC)

It is not possible to import senses from any Wiktionary project because licences are not compatible. Wikidata is released under CC-0 while Wiktionary senses are protected by CC by-sa licence. Pamputt (talk) 18:34, 3 August 2020 (UTC)

Uzielbot 2[edit]

Uzielbot 2 (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Uziel302 (talkcontribslogs)

Task/s: mark broken links as deprecated

Code: https://github.com/Uziel302/wikidatauploadjson/blob/master/deprecatebrokenlinks

Function details: a simple wbeditentity calls to mark broken official links as deprecated. I did few examples on my bot account, all the edits proposed are of the same nature. I detect broken links based on http header (no response/400/404 are considered broken). --Uziel302 (talk) 23:49, 7 April 2020 (UTC)

no response/400/404 may have multiple reasons, including temporary maintenance, content that only accessible when sign-in, content only accessible in some countries, internet censorship, etc.--GZWDer (talk) 06:14, 8 April 2020 (UTC)
GZWDer, which of these edge cases are not relevant in manual checking? How is it possible to really detect broken links? And if no such option exists, should we ban "reason for deprecation: broken link"? Uziel302 (talk) 17:23, 8 April 2020 (UTC)
This means you should not flag them as broken links without checking them multiple times.--GZWDer (talk) 17:26, 8 April 2020 (UTC)
GZWDer, no problem, how many is multiple? Uziel302 (talk) 21:45, 8 April 2020 (UTC)
I am the main Bots writer in Hebrew Wikipedia, wrote over 500 Bots along the years. I can testify the broken links are a big problem and we need to resolve it from the source. I discussed it with Uziel302 prior to him writing here and I am convinced the method suggested here is the preferred method. Lets move forward to cleanup these broken links so they do not bother us any more. בורה בורה (talk) 09:18, 13 April 2020 (UTC)
@GZWDer: Would you react to the question? Is there a benchmark to consider a link broken? Repetitave checks with a minimla number of checks and a minimal time span? Lymantria (talk) 08:30, 16 May 2020 (UTC)
I don't think it should them to deprecated. You could add "end cause" *404". --- Jura 13:23, 16 May 2020 (UTC)

WordnetImageBot[edit]

WordnetImageBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: WordnetImageBot (talkcontribslogs)

Task/s:

This bot is part of the Final Degree Project in which i link the offset/code of the words in Wordnet with the words and images of wikidata.

Code:

Is to be done.

Function details: --WordnetImageBot (talk) 12:16, 18 March 2020 (UTC)

Link words and images with the words of Wordnet, that is, add an exact match (P2888) url to those words that haven´t got a link with Wordnet. If a word in Wikidata doesn´t have an image, this bot will add the image.

Please, make some test edits and create the bot's user page containing {{Bot}}. Lymantria (talk) 06:36, 27 April 2020 (UTC)
@Andoni723: reminder to make the test edits --DannyS712 (talk) 12:03, 7 July 2020 (UTC)

Taigiholic.adminbot[edit]

Taigiholic.adminbot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Taigiholic (talkcontribslogs)

Task/s: interwiki linking and revising

Code: using pywikibot scripts

Function details:

  • The account is already a adminbot on nanwiki whose works are mainly interwiki linking and revising in semi-automatic and batch way.
  • The operator owns another bot named User:Lamchuhan-bot, which is mainly working on interwiki linking for NEW ARTICLES (such works will be mainly done by this new requesting account once it gets the flag) only, no revising works.
  • Only request for "normal bot" flag here at this site, not for "adminbot".

Thanks.--Lamchuhan-hcbot (talk) 00:17, 16 March 2020 (UTC)

Thanks.--Lamchuhan (talk) 00:19, 16 March 2020 (UTC)

I think I sort of understand what the task is, but could you please be more specific?--Jasper Deng (talk) 06:59, 16 March 2020 (UTC)

@Jasper Deng: The bot will run by using pywikibot scripts on nanwiki. Some of the tasks will have to run interwiki actions such as:

item.setSitelink(sitelink={'site': 'zh_min_nanwiki', 'title': 'XXXXX'}, summary=u'XXXXX')

Thanks.--Lamchuhan (talk) 07:34, 16 March 2020 (UTC)

Please register the bot account and make some test edits.--Ymblanter (talk) 20:24, 18 March 2020 (UTC)

GZWDer (flood) 3[edit]

GZWDer (flood) (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: GZWDer (talkcontribslogs)

Task/s: Creating items for all Unicode characters

Code: Unavailable for now

Function details: Creating items for 137,439 characters (probably excluding those not in Normalization Forms):

  1. Label in all languages (if the character is printable; otherwise only Unicode name of the character in English)
  2. Alias in all languages for U+XXXX and in English for Unicode name of the character
  3. Description in languages with a label of Unicode character (P487)
  4. instance of (P31)Unicode character (Q29654788)
  5. Unicode character (P487)
  6. Unicode hex codepoint (P4213)
  7. Unicode block (P5522)
  8. writing system (P282)
  9. image (P18) (if available)
  10. HTML entity (P4575) (if available)
  11. For characters in Han script also many additional properties; see Wikidata:WikiProject CJKV character

For characters with existing items the existing items will be updated.

Question: Do we need only one item for characters with the same normalized forms, e.g. Ω (U+03A9, GREEK CAPITAL LETTER OMEGA) and Ω (U+2126, OHM SIGN)?--GZWDer (talk) 23:08, 23 July 2018 (UTC)

CJKV characters belonging to CJK Compatibility Ideographs (Q2493848) and CJK Compatibility Ideographs Supplement (Q2493862) such as 著 (U+FA5F) (Q55726748), 著 (U+2F99F) (Q55738328) will need to be split from their normalized form, eg. (Q54918611) as each of them have different properties. KevinUp (talk) 14:03, 25 July 2018 (UTC)

Request filed per suggestion on Wikidata:Property proposal/Unicode block.--GZWDer (talk) 23:08, 23 July 2018 (UTC)

 Support I have already expressed my wish to import such dataset. Matěj Suchánek (talk) 09:25, 25 July 2018 (UTC)
 Support @GZWDer: Thank you for initiating this task. Also, feel free to add yourself as a participant of Wikidata:WikiProject CJKV character. [31] KevinUp (talk) 14:03, 25 July 2018 (UTC)
 Support Thank you for your contribution. If possible, I hope you to also add other code (P3295) such as JIS X 0213 (Q6108269) and Big5 (Q858372) in items you create or update. --Okkn (talk) 16:35, 26 July 2018 (UTC)
  •  Oppose the use a of the flood account for this. Given the problems with unapproved defective bot run under the "GZWDer (flood)" account, I'd rather see this being done with a new account named "bot" as per policy.
    --- Jura 04:50, 31 July 2018 (UTC)
  • Perhaps we could do a test run of this bot with some of the 88,889 items required by Wikidata:WikiProject CJKV character and take note of any potential issues with this bot. @GZWDer: You might want to take note of the account policy required. KevinUp (talk) 10:12, 31 July 2018 (UTC)
  • This account has had a bot flag for over four years. While most bot accounts contain the word "bot", there is nothing in the bot policy that requires it, and a small number of accounts with the bot flag have different names. As I understand it, there is also no technical difference between an account with a flood flag and an account with a bot flag, except for who can assign and remove the flags. - Nikki (talk) 19:14, 1 August 2018 (UTC)
  • The flood account was created and authorized for activities that aren't actually bot activities. While this new task is one. Given that there had already been run defective bot tasks with the flood account, I don't think any actual bot tasks should be authorized. It's sufficient that I already had to clean up 10000s of GZWDer's edits.
    --- Jura 19:46, 1 August 2018 (UTC)
I am ready to approve this request, after a (positive) decision is taken at Wikidata:Requests for permissions/Bot/GZWDer (flood) 4. Lymantria (talk) 09:11, 3 September 2018 (UTC)
  • Wouldn't these fit better into Lexeme namespace? --- Jura 10:31, 11 September 2018 (UTC)
    There is no language with all Unicode characters as lexemes. KaMan (talk) 14:31, 11 September 2018 (UTC)
    Not really a problem. language codes provide for such cases. --- Jura 14:42, 11 September 2018 (UTC)
    I'm not talking about language code but language field of the lexeme where you select q-item of the language. KaMan (talk) 14:46, 11 September 2018 (UTC)
    Which is mapped to a language code. --- Jura 14:48, 11 September 2018 (UTC)
Note I'm going to be inactive for real life issue, so this request is Time2wait.svg On hold for now. Comments still welcome, but I'm not able to answer it until January 2019.--GZWDer (talk) 12:08, 13 September 2018 (UTC)
 Support I wonder why the information isn't in Wikidata for such a long time when many less notable subjects have complete data. --Midleading (talk) 02:38, 31 July 2020 (UTC)
 Oppose This user has no respect on infra's capacity in any way, these accounts along two others has been making wikidata basically unusable (phab:T242081) for months now. I think all of other approvals of this user should be revoked, not to add more on top. (Emphasis: This edit is done in my volunteer capacity)Amir (talk) 17:26, 17 August 2020 (UTC)
Repeating from another RFP. Given that WMDE is going to remove noratelimit from bots, your bot won't cause more issues hopefully but you lost your good standing with regards to respecting infra's capacity to me. Amir (talk) 18:53, 10 October 2020 (UTC)
While this is open, it is important not to merge letter and Unicode character items, like Nikki did with ɻ (Q56315451) and ɻ (Q87497973), ƾ (Q56316849) and ƾ (Q87497496), ʎ (Q56315460) and ʎ (Q87498018), etc.; the whole goal of this project is to keep them apart. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 14:02, 25 January 2021 (UTC)


MusiBot[edit]

MusiBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools) Operator: Joaquinserna (talkcontribslogs)

Task/s: Query over Genius.com API and adding Genius artist ID (P2373) and Genius artist numeric ID (P6351) statements to its respective artist in case it hasn't been added before.

Code: Not provided. Using WikidataJS for SPARQL querying and claim editing.

Function details: Sequencialy query Genius.com for every possible ArtistID, search Wikidata for any singer (Q177220) or musical group (Q215380) with the same label as Genius' artist name, check if it has Genius artist ID (P2373) and Genius artist numeric ID (P6351) statements, then adds them if neccessary.

Discussion:

Already did a successful test here forcing Wikidata Sandbox (Q4115189) to be the updated id, Genius' ArtistID n° 1 is Cam'ron (Q434913) which already has Genius artist ID (P2373) Genius artist numeric ID (P6351).

Joaquín Serna (talk) 01:11, 28 February 2020 (UTC)

Could you please make a bit more test edits and on real items?--Ymblanter (talk) 20:00, 3 March 2020 (UTC)
: Done, you can check it out here Joaquín Serna (talk)
Add Genius artist numeric ID (P6351) as a qualifier to Genius artist ID (P2373). If you gather the data from Genius API, use Genius API (Q65660713) as reference. Optionally if you could also add has quality (P1552)verified account (Q28378282) for "Verified Artist" that would be great. - Premeditated (talk) 09:42, 18 March 2020 (UTC)

AitalDisem[edit]

AitalDisem (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Aitalvivem (talkcontribslogs)

Task/s: This bot is made to create a sens (with an item) for every occitan lexeme in wikidata. This is the direct continuation of AitalvivemBot. This is a web application presented as a game. The program will get (for each lexeme) its french translation from Lo Congres's data (unfortunately those are privates data so we can't insert them in wikidata) and look for every item having the occitan word or translation in its label. Then it will use the collaborative work of the community to select the good senses and, once validated, insert them in wikidata. This program has the same goal than Michael Schoenitzer's MachtSinn but uses a translation database. I am also trying to make this program simple to adapt for other languages and with a complete documentation.

Code: You can find my code and documentation here

Function details: It would be very long to list every functions of this program (you can find it in the documentation here and here) but overall this bot will :

  • get informations about lexeme, senses and items
  • create senses
  • add items to senses using Property:P5137

The program will also verify to have enough positive responses from users before inserting a sens. All the details about the process of a game, the test of reliability of an user and the verification before inserting a sens are in the documentation.

--Aitalvivem (talk) 15:48, 14 January 2020 (UTC)

  •  Support This seems like a nice approach to collaborative translation. ArthurPSmith (talk) 17:45, 15 January 2020 (UTC)
  • Can you provide some test edits (say 50-100)? Lymantria (talk) 10:30, 18 January 2020 (UTC)
    • @Lymantria: Hi, I did two test run on 30 lexemes, for each lexeme I did two edits : one to add a sens and the other to add an Item to this sens.
Here is the list of the lexemes : Lexeme:L41768, Lexeme:L44861, Lexeme:L57835, Lexeme:L57921, Lexeme:L235215, Lexeme:L235216, Lexeme:L235217, Lexeme:L235219, Lexeme:L235221, Lexeme:L235222, Lexeme:L235223, Lexeme:L235225, Lexeme:L235226, Lexeme:L235227, Lexeme:L235228, Lexeme:L235229, Lexeme:L235231, Lexeme:L235232, Lexeme:L235234, Lexeme:L235235, Lexeme:L235236, Lexeme:L235239, Lexeme:L235240, Lexeme:L235242, Lexeme:L235243, Lexeme:L235244, Lexeme:L235245, Lexeme:L235246, Lexeme:L235247, Lexeme:L235248
The first test failed because of a stupid mistake of mine in the configuration file of the program. For the second test I had a problem when adding the item for Lexeme:L235226 because there were quotations marks in the description of the item so I fixed the probelem, run it again and everything went well.--Aitalvivem (talk) 10:11, 21 January 2020 (UTC)
I take it the test edits are the ones by the IP? Lymantria (talk) 08:31, 22 January 2020 (UTC)
Yes, I used the bot account to connect to the api but I don't know why it prints the IP instead of the bots account--Aitalvivem (talk) 09:53, 22 January 2020 (UTC)
I would like to see you succeed to do so. Lymantria (talk) 12:57, 26 January 2020 (UTC) (@Aitalvivem: 07:02, 29 January 2020 (UTC))
@Aitalvivem: Any progress? Lymantria (talk) 08:34, 16 May 2020 (UTC)
@Aitalvivem: Seems like there hasn't been any progress on this? Hazard-SJ (talk) 06:10, 7 October 2020 (UTC)
  • Note for closing bureaucrats: IPBE granted for 6 months per Special:Permalink/1102840151#IP blocked; please switch to permanent IPBE when you approve it. (Or the bot master should consider using Wikimedia Cloud Services where you don't get any IP blocks and a server env for use.) — regards, Revi 14:14, 22 January 2020 (UTC)

BsivkoBot[edit]

BsivkoBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Bsivko (talkcontribslogs)

I already have BsivkoBot at ruwiki. Do I need to register another account here? Or, should I link the existed bot? (right now, User account "BsivkoBot" is not registered)

Task/s:

  • Check Britannica URL (P1417) and clean up invalid claims. The reason that there are a lot of claims which linked to nowhere. For instance, Q1143358 has P1417 as sports/shortstop, which goes to https://www.britannica.com/sports/shortstop where we have Britannica does not currently have an article on this topic.

Code:

  • I use pywikibot, and currently I have a piece of software which gets P1417, makes a request with URL, gets page text, recognize article absence and stops with permissions exception:
   item = pywikibot.ItemPage.fromPage(page)
   if item:
       item.get()
       if item.claims:
           if 'P1417' in item.claims:
               brit_value = item.claims['P1417'][0].getTarget()
               brit_url = "https://www.britannica.com/" + brit_value
               r = requests.get(url=brit_url)
               if r.status_code == 200:
                   if "Britannica does not currently have an article on this topic" in r.text:
                       item.removeClaims(item.claims['P1417'], summary=f"Article in Britannica is absent (URL: 'Template:Brit url').")
               pass

Afterwards, I'm going to make test runs and integrate it with the other bot functions (I work with external sources at ruwiki, and in some cases auto-captured links from wikidata are broken and it leads to user complains).

Function details: --Bsivko (talk) 19:37, 28 December 2019 (UTC)

Please create an account for your bot here and make some test edits.--Ymblanter (talk) 21:26, 28 December 2019 (UTC)
I logged in by BsivkoBot via ruwiki and went to wikidata. It created the account (user BsivkoBot is existed here now). After that, I made a couple of useful actions by hand (not with bot), so BsivkoBot can do smth on the project. Next, I tried to run the code above and the exception changed to a different one:

{'error': {'code': 'failed-save', 'info': 'The save has failed.', 'messages': [{'name': 'wikibase-api-failed-save', 'parameters': [], 'html': {'*': 'The save has failed.'}}, {'name': 'abusefilter-warning-68', 'parameters': ['new editor removing statement', 68], 'html': {'*': 'Warning: The action you are about to take will remove a statement from this entity. In most cases, outdated statements should not be removed but a new statement should be added holding the current information. The old statement can be marked as deprecated instead.'}}], 'help': 'See https://www.wikidata.org/w/api.php for API usage. ..

I checked that it possible to remove the claim. So, the problem is on the bot side. Could you please help me, is it a permission problem or the code should be different? (as I see, it requires write rights, but I do not see any rights now) Bsivko (talk) 00:15, 29 December 2019 (UTC)
I changed the logic to setting of deprecated rank, and it was a success! Bot changed the rank and it was dissapeared for users in our article. Afterwards a test run, the code is the following:
       if item.claims:
           if 'P1417' in item.claims:
               for claim in item.claims['P1417']:
                   brit_value = claim.getTarget()
                   brit_url = "https://www.britannica.com/" + brit_value
                   r = requests.get(url=brit_url)
                   if r.status_code == 200:
                       if "Britannica does not currently have an article on this topic" in r.text:
                           claim.changeRank('deprecated', summary="Article in Britannica is absent (URL: '" + brit_url + "').")
               pass

Currently, it works. I'll integrate it into production. Bsivko (talk) 11:58, 29 December 2019 (UTC)

@Bsivko: The above error means you will require a confirmed flag for your bot.--GZWDer (talk) 21:03, 29 December 2019 (UTC)
Ok, I've got it, thank you for the explanation! I already implemented the function and rank changing is enough, it resolved the problem. Bsivko (talk) 21:10, 29 December 2019 (UTC)
Note your edits may be controversial. You should reach a consensus for such edits. (I don't support such edits, but someone may.)--GZWDer (talk) 21:48, 29 December 2019 (UTC)
I understand. I just started the discussion on chat. Please, join. Bsivko (talk) 00:20, 30 December 2019 (UTC)
  • Strong oppose per my comments when this was discussed in 2016. These are not "invalid claims". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:07, 30 December 2019 (UTC)
  • @Bsivko: BsivkoBot appears to be editing without authorization; there isn't support for more edits here, and I don't see another permissions request. Please stop the bot, or I will have to block it --DannyS712 (talk) 02:58, 8 May 2020 (UTC)
  • As I see, current topic is still under discussion, and functions above are off till that moment. For the extra stuff I'll open another branch. Bsivko (talk) 12:35, 8 May 2020 (UTC)

antoine2711bot[edit]

antoine2711bot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Antoine2711 (talkcontribslogs)

Task/s: This robot will add data in the context of Digital discoverability project of the RDIFQ (Q62382524).

Code: It's work done with OpenRefine, maybe a bit of QuickStatement, and maybe some API calls from Google Sheet.

Function details: Tranfer data for 280 movies from an association of distributors.

--Antoine2711 (talk) 04:25, 2 July 2019 (UTC)

@Antoine2711: Is your request still supposed to be active? Do you have test-/exmple-edits? Lymantria (talk) 07:18, 17 August 2019 (UTC)
@Lymantria: No, it's mainly batch operations that I do myself. There is nothing automated yet, and there won't be for the project I've been working in the last 9 months. --Antoine2711 (talk) 05:15, 2 March 2020 (UTC)
  • @Antoine2711, Lymantria: It seems to be unfinished, many items created are somewhat empty and not used by any other item: [32]. @‎Nomen ad hoc: had listed one of them for deletion. If the others aren't used or completed either, I think we should delete them. Other than that: lots of good additions. --- Jura 12:57, 8 September 2019 (UTC)
@Lymantria, Jura: The data I added all comes from a clean data set provided by distributors. I tried to do my best, but I might not have done everything perfectly. Someone spooted an empty item, and I added the missing data. If there are any other, I will do the same corrections.
My request for the bot is still pertinent as I will do other additions. What information do I need to provide for my bot request? --Antoine2711 (talk) 16:44, 8 September 2019 (UTC)
@Jura1: Sorry for not responding earlier. Theses people are team members on a movie, and I needed to create a statement with {{P]3092}} and also a quantifier, object has role (P3831), and in the case of Ronald Fahm (Q65116570), he's a hairdresser (Q55187). Ideally, I should be able to also push that in the Dxx, the description of this person. But I must be careful. I created around 1500 persons, and I might have 200 still not connected. Do you see anything else? --Antoine2711 (talk) 03:37, 25 February 2020 (UTC)
Yesterday I looked into this request again and noticed that the problems I had identified 5 months ago were still not fixed. If you need help finding all of them, I could do so. --- Jura 08:46, 25 February 2020 (UTC)
@Jura1: yes, anything you see that I didn't do well, tell me, and I'll correct it. If I create 500 items, and if I just do 1 % of errors, it's still 5 bad item creation. So even if I'm careful, I'm still learning and doing mistake. I try to correct them as fast as I can, and if you can help me pin-point problems, I'm fix them, like I did with everyone here. If you have SparQL queries (or other way of find lots of data) let me know and don't hesitate to share with me. Regards, Antoine --Antoine2711 (talk) 06:37, 2 March 2020 (UTC)
  • There's a deletion request for one of these items at Wikidata:Requests for deletions#Q65119761. I've mentioned a likely identifier for that one. Instead of creating empty items it would be better to find identifiers and links between items before creating them. For example Peter James (Q65115398) could be any of 50 people listed on IMDB - possibly nm6530075, the actor in Nuts, Nothing and Nobody (Q65055294)/tt3763316 but the items haven't been linked and they are possibly not notable enough for Wikidata. Other names in the credits there include Élise de Blois (Q65115717) (probably the same person as the Wikidata item) and Frédéric Lavigne (Q65115798) (possibly the same one but I'm not certain) and several with no item so I'm not sure if this is the source of these names. With less common names there could be one item that is then assumed to be another person with the same name. Peter James (talk) 18:29, 9 September 2019 (UTC)
Hi @Peter James: I think that most of theses are now linked. For a few hundred, I still need to put the occupation. I also think that I'm going to state, for the ones with few information, the movies in which they worked. Might help to identify those persons. I've also created links to given name and surname, but that doesn't help much for identification. Note that I added alot of IMDB ID, and those are excellent. Do you have suggestion for me? Regards, Antoine --Antoine2711 (talk) 04:53, 2 March 2020 (UTC)
  • I suggest we block this bot until we see a plan of action for cleanup of the problems already identified. --- Jura 09:44, 24 February 2020 (UTC)
Cleanup seems to be ongoing. --- Jura 09:36, 25 February 2020 (UTC)
  • Pictogram voting comment.svg Comment I fixed writing system (P282) on several hundreds of family name items created yesterday, e.g. at [33]. --- Jura 09:42, 1 March 2020 (UTC)
I didn't know that alphabet latin (Latin alphabet (Q41670)) and alphabet latin (Latin script (Q8229)) where actually 2 different things. Thank you for pointing that out. --Antoine2711 (talk) 04:25, 2 March 2020 (UTC)
  • when doing that, I came across a few "last" names that aren't actually family names, e.g. H. Vila (for Rodrigo H. Vila), and listed them for deletion. You might want to double-check all other. --- Jura 10:02, 1 March 2020 (UTC)
Yes, thanks for pointing that out. I'm also cleaning that up. --Antoine2711 (talk) 04:25, 2 March 2020 (UTC)
@Antoine2711: you seem to be running an unauthorized bot that is doing weird edits. Please explain. Multichill (talk) 15:46, 7 March 2020 (UTC)
@Multichill: Hi, yes, I did see those errors. I was cleaning that yesterday, and will continue today. This was a edit batch with 3% of mistake, on a 3000 lot. Even if 3% is not a lot, in those quantities, I must be very careful. Unfortunately I'm still learning. Please note that nothing this bot do is not supervised and launched by a human decision (which may be unperfect…). Regards, Antoine --Antoine2711 (talk) 17:12, 7 March 2020 (UTC)