Shortcuts: WD:RFBOT, WD:BRFA, WD:RFP/BOT

Wikidata:Requests for permissions/Bot

From Wikidata
Jump to navigation Jump to search


Wikidata:Requests for permissions/Bot
To request a bot flag, or approval for a new task, in accordance with the bot approval process, please input your bot's name into the box below, followed by the task number if your bot is already approved for other tasks. Then transclude that page onto this page, like this: {{Wikidata:Requests for permissions/Bot/RobotName}}.

Old requests go to the archive.

Once consensus is obtained in favor of granting the botflag, please post requests at the bureaucrats' noticeboard.

Translate this header box!


Bot Name Request created Last editor Last edited
William Avery Bot 7 2022-06-24, 12:01:19 Epìdosis 2022-06-24, 12:14:07
JerusalemcinemaBot 2022-05-27, 22:11:15 Jerusalemcinema 2022-05-27, 22:38:55
William Avery Bot 6 2022-06-11, 10:14:43 William Avery 2022-06-24, 12:12:22
Crystal-bot 2022-06-03, 00:18:50 Stang 2022-06-03, 00:42:10
OJSOptimetaCitationsBot 2022-06-01, 07:37:16 YucelGazi 2022-06-01, 08:18:02
TFGBot 2022-05-17, 19:22:16 Lymantria 2022-06-07, 07:52:21
AmeisenBot 2022-04-14, 16:26:16 Ameisenigel 2022-06-07, 09:21:36
Botcrux 10 2022-04-05, 09:51:09 ArthurPSmith 2022-04-05, 18:31:33
AradglBot 2022-03-14, 19:43:27 Mahir256 2022-06-08, 20:14:59
PodcastBot 2022-02-25, 04:38:31 Trade 2022-06-12, 18:34:52
Pi bot 10 2018-12-01, 22:10:53 Mike Peel 2022-01-25, 10:27:21
GretaHeng18bot 2022-01-21, 18:38:00 Gretaheng18 2022-03-04, 22:40:01
companyBot 2022-01-09, 23:44:25 Germartin1 2022-04-28, 04:47:17
StreetmathematicianBot 2 2021-11-20, 19:45:53 Lymantria 2022-01-19, 06:17:18
ConferenceCorpusBot 2021-11-06, 15:42:55 WolfgangFahl 2022-05-28, 06:42:43
TAMISBot 2021-08-27, 13:33:52 Antoine2711 2022-06-09, 02:57:13
AmmarBot 4 2021-07-19, 10:56:46 Mike Peel 2022-01-18, 22:12:43
RonniePopBot 2021-07-04, 10:06:32 RonnieV 2022-03-04, 15:41:52

William Avery Bot 7[edit]

William Avery Bot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: William Avery (talkcontribslogs)

Task/s: Merge multiple references on the same claim citing Accademia delle Scienze di Torino

Code: https://bitbucket.org/WilliamAvery/wikipythonics/src/master/consolidateRefs.py

Function details: This script will clean up the problem raised at WD:RBOT § Accademia delle Scienze di Torino multiple references (01-05-2022) The SPARQL query given at that discussion lists items with multiple references sourced to Accademia delle Scienze di Torino on a single claim. None of these instances appears to be legitimate.

The script merges multiple references on the same claim that have referring to stated in (P248) = www.accademiadellescienze.it (Q107212659), as long as there are no conflicting values within the references.

If any references then have both Accademia delle Scienze di Torino ID (P8153) and reference URL (P854), they will be dealt with manually.

Sample test edits made using the script under my account:

--William Avery (talk) 12:00, 24 June 2022 (UTC)[reply]

 Support of course. --Epìdosis 12:14, 24 June 2022 (UTC)[reply]

JerusalemcinemaBot[edit]

JerusalemcinemaBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Jerusalemcinema (talkcontribslogs)

Task/s: Add new Wikidata records for events happened in the holly land in the early 20th century.

Code: Stil in development

Function details: --Jerusalemcinema (talk) 22:11, 27 May 2022 (UTC) The Israeli Film Archive (IFA) plans to release the data of their newsreels collection to Wikidata. The IFA holds a treasure trove of historical newsreels from the early 20th century until the mid 70’s, filmed mostly (but not only) in Mandatory Palestine, and later in the State of Israel. These 1,200 film reels comprise some of the earliest, rarest and most completed audiovisual documentation of the events and people who are a core part of Israel’s rich and challenging history.[reply]

The newsreels have been digitized, tagged and enriched with information such as geographical coordinates, people and events depicted, date or year, and so on. We plan to enrich Wikidata with all these great infmation.

William Avery Bot 6[edit]

William Avery Bot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: William Avery (talkcontribslogs)

Task/s: Increment Shakeosphere person ID by 24638, as discussed at WD:RBOT § Shakeosphere person ID

Code: https://bitbucket.org/WilliamAvery/wikipythonics/src/master/fixShakeosphere.py

Function details:

I scanned the items with this property, and the vast majority of the IDs require incrementing by 24638 to link to the relevant page on Shakeosphere (Q24284201). Nobody has expressed any desire to delete this property on the grounds of instability.

Items where the Shakeosphere ID has a reference with retrieved (P813) on or after 10 June 2022 are ignored, on the assumption that their Shakeosphere ID has already been corrected.

The script increments the existing ID by 24638, then attempts to retrieve the corresponding Shakeosphere page.

An attempt is made to match the title of the Shakeosphere page with the names and aliases on the Wikidata item.

The result of the matching is output to a report page to aid checking. (Example at User:William Avery Bot/Shakeosphere report)

Results of previously running this matching indicate that in all cases where the new ID corresponds to a valid Shakeospere page the new ID is correct.

Any existing references on Shakeosphere ID are replaced with new references, including retrieved (P813) and subject named as (P1810), and the Shakespere ID is updated to the new value

I have done a few tests running the script against individual items to update under my own user:

--William Avery (talk) 10:13, 11 June 2022 (UTC)[reply]

If there are no objections, I will do a test run of 50 edits on 27/06/2022. William Avery (talk) 12:12, 24 June 2022 (UTC)[reply]

Crystal-bot[edit]

Crystal-bot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Stang (talkcontribslogs)

Task/s: Add MediaWiki page ID (P9675) and language of work or name (P407) qualifiers to items using Moegirlpedia ID (P5737) identifier.

Code: Gist1, Gist2

Function details: Pywikibot based. Generate item list from SPARQL query, retrieve page information via moegirl API, then build and update qualifiers. Affect 2529 pages, will be a one-time run script but maybe running periodically (1d or 1w) in the future. Special thanks to BorkedBot's lovely code. --Stang 00:18, 3 June 2022 (UTC)[reply]

OJSOptimetaCitationsBot[edit]

OJSOptimetaCitationsBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: YucelGazi (talkcontribslogs)

Task/s: Add citation and author data for publications in journals hosted in Open Journal Systems

Code: https://github.com/TIBHannover/optimetaCitations

Function details: We are developing a plugin (OJSOptimetaCitations) for Open Journal Systems (OJS) which gives the publishers/authors the possibility to edit citations of publications/articles within OJS. The citations will be enriched with data from OpenAlex.org and Crossref.org, currently based on DOIs. This matching will be extended in the future with other PIDs, such as URN and other PIDs. The plugin also gives the possibility to manually edit the citations within OJS. The authors in the citations are also enriched and editable within OJS. The matching for the authors is done by their orcid ids.

After this process, the citations will be published to Open Access websites such as Wikidata, OpenCitations, Crossref and so on.

On Wikidata, the plugin will search via the API based on DOIs for the publication/article and all citations. Depending on if the article is already on wikidata, this will be changed or added. The citations will be matched and added/changed also. These will then be added as claims to the main article.

The same process will be done for the authors of the article and the authors for the citations. The authors will be matched with their corresponding orcid ids.

This plugin will be available through the official plugin gallery of OJS and will be available to all OJS versions from 3.2 and newer, which is around 25000 at the moment of writing.

Project links: Optimeta Project Page Optimeta Project Document

--YucelGazi (talk) 07:37, 1 June 2022 (UTC)[reply]

TFGBot[edit]

TFGBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: TFGBot (talkcontribslogs)

Task/s: I want to upload images to wikidata to enrich the web site

Code:

Function details: It's basic. Just by obtaining images and descriptions in different languages I want to upload them to wikidata. Mostly the language euskara (eu). --TFGBot (talk) 19:22, 17 May 2022 (UTC)[reply]

  • Please create a dedicated account for operator.
  • Images are uploaded to Commons, not Wikidata.

--GZWDer (talk) 22:24, 17 May 2022 (UTC)[reply]

 Oppose images should be at commons, not here. Lymantria (talk) 07:52, 7 June 2022 (UTC)[reply]

AmeisenBot[edit]

AmeisenBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Ameisenigel (talkcontribslogs)

Task/s: Label unsigned comments on talk pages

Code: Python

Function details: Unlike many other wikis Wikidata has no bot for adding {{Unsigned}} to unsigned comments on talk pages. Since it is very helpful to see who has left a comment on a talk page I would like to propose a bot for this task. --Ameisenigel (talk) 16:25, 14 April 2022 (UTC)[reply]

@Ameisenigel: Is the code public, please? Asking out of curiosity, although this sounds like a very good idea. Thanks. Mike Peel (talk) 16:37, 14 April 2022 (UTC)[reply]
@Mike Peel: Not yet, but it is based on https://gist.github.com/zhuyifei1999/49af65a7f07fa950a381171ea037135e I should be able to make the code public soon. --Ameisenigel (talk) 16:51, 14 April 2022 (UTC)[reply]
Now public: https://gist.github.com/Ameisenigel/5974064e46af281e4053644e1c99f42e --Ameisenigel (talk) 16:31, 15 April 2022 (UTC)[reply]
Please make some test edits Ymblanter (talk) 19:01, 20 April 2022 (UTC)[reply]
I support the concept. BrokenSegue (talk) 19:15, 23 April 2022 (UTC)[reply]
Note: per template documentation you should substitute the template.--GZWDer (talk) 20:14, 29 April 2022 (UTC)[reply]
Thanks, I have already noticed this. --Ameisenigel (talk) 16:16, 5 May 2022 (UTC)[reply]
Can you do a test run? --Lymantria (talk) 07:40, 7 June 2022 (UTC)[reply]
I have to make some improvements to the code first, because there are some technical difficulties. --Ameisenigel (talk) 09:21, 7 June 2022 (UTC)[reply]

Botcrux[edit]

Botcrux (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Horcrux (talkcontribslogs)

Task/s: Change publication date (P577) of scientific articles from "1 January YYYY" to just "YYYY".

Problem description: Currently we have a lot of wrong statements for publication date (P577) of scientific articles. The error lies in the precision used for stating the date. The correct precision should be 9 (year), while the current precision is 11 (day).

For understanding the magnitude of the problem, please compare the items having in description "01 January" (2200K+) with "02 January" (7K+).

Function details: Once selected all the instances of scholarly article (Q13442814) with some Jan 1st as publication date (P577), the bot would simply:

  1. make edits like this one, changing all the datetimes from "+YYYY-01-01T00:00:00Z/11" to "+YYYY-00-00T00:00:00Z/9" (or "+YYYY-01-01T00:00:00Z/9", for remaining consistent with the source).
  2. in a second moment, fix all the descriptions based on such wrong statements (example).

If possible, the bot would use QuickStatements for solving the issue in a reasonable time. --Horcrux (talk) 09:51, 5 April 2022 (UTC)[reply]

  • @Horcrux: This is definitely a problem that should be fixed, but what about those articles that legitimately were published on January 1? Can you filter based on how these articles were originally imported? ArthurPSmith (talk) 15:29, 5 April 2022 (UTC)[reply]
    @ArthurPSmith: This chould be possible inspecting all the HTML pages looking for a match for "YYYY 1 Jan" in the header's citation. For instance, here the bot would change the date because there is no match in the citation of [1], while here the bot would keep the date because there is a match in the citation of [2]. --Horcrux (talk) 17:08, 5 April 2022 (UTC)[reply]
  • I noticed that we have to face a similar issue concerning precision 10 (months). For instance, please the numer of items having in description "01 August" (600K+) with "02 August" (10K+). Clearly, for January we have higher numbers because the two problems are mixed. Anyway, the solution for both is the same. --Horcrux (talk) 17:15, 5 April 2022 (UTC)[reply]
    Many articles are (or were) published in paper journals with issues that came out nominally on the first of each month, or some months. So many of those could well be correct. I know where I work there are some journals that have/had 1st of month and 15th of month issues, so both the 1st and 15th would give you high counts and other dates not so much. If it really was only a monthly issue then reverting to month precision may be fine, but if the journal has multiple issues per month then losing that 1 is not a good idea. ArthurPSmith (talk) 18:12, 5 April 2022 (UTC)[reply]
    @ArthurPSmith: This is indeed a good point, but if the source doesn't report the "1" either, why should we? --Horcrux (talk) 18:24, 5 April 2022 (UTC)[reply]
    As long as there's some check like that I guess it's reasonable. Anyway, the January 1 ones are clearly mostly wrong. ArthurPSmith (talk) 18:31, 5 April 2022 (UTC)[reply]

AradglBot[edit]

AradglBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Aradgl (talkcontribslogs)

Task/s:

Create between 100,000 and 200,000 new lexemes in Aragonese language Aragonese (Q8765)

Code:

Function details: --Aradgl (talk) 19:43, 14 March 2022 (UTC)[reply]

Using a small program and the api, the bot will create new lexemes in Aragonese specifying the lexical category, the language and some of its forms

I have about 30,000 lexemes prepared and I have started uploading them

In the coming weeks and months I hope to reach 100,000 or 200,000 new lexemes.

  •  Oppose on principle, since senses (meanings) of these words, or links to references for each lexeme (such as to dictionary entries for these words, or other lexical identifiers for these words) are not also being provided. We already have massive backlogs of senseless lexemes for a bunch of languages (see the bottom of the first table); I will not support making this backlog inordinately larger. Mahir256 (talk) 20:58, 23 March 2022 (UTC)[reply]
We understand your observations. You are right that no meanings or links are provided at this stage. However, this is only natural since this is the beginning of a broader task that we are starting now.
Due to the lack of resources of a minority language such as Aragonese (spoken by less than 30.000 people), we believe this is the most sensible way to proceed: step by step. Moreover, Aragonese is on the brink of extinction according to UNESCO.
Undermining any effort to dignify its status will definitely will speed up the death of the Aragonese language. On the contrary, we ask for support to promote our beloved language.
Thank you very much. Aradgl (talk) 18:46, 24 March 2022 (UTC)[reply]
  • @Aradgl: I'm not sure where you're getting that I'm interested in undermining Aragonese's dignity or speeding up the death of Aragonese. On the contrary, I'd love to see Aragonese thrive as an independent and flourishing tongue, but there should be just enough in that language's lexemes to begin with such that improvements to them, both from inside and outside the language community, are actually conceivable. Consider Breton lexemes: the language itself is also endangered, and most Breton lexemes currently do not have senses, but they do have links to Lexique étymologique du breton moderne (Q19216625), so that someone else (not necessarily a Breton speaker) can come by and at least add information based on that lexicon (@VIGNERON, Envlh:, who imported them). On the other hand, consider Estonian lexemes; an Estonian non-native speaker created a bunch of them over the course of a few days, all of them without senses, and most still sit as empty shells, with no clear way for non-Estonians to improve them and no indication that actual Estonian speakers even know they exist. I am happy to look around for references you could add to potential Aragonese lexemes, such that you can add some potential resource links based on them, but that is not a reason to begin importing them now without any such resources (especially since you have not indicated how/when you plan to add senses/resource links later). Mahir256 (talk) 20:01, 24 March 2022 (UTC)[reply]
    @Mahir256 Right now we are discussing our timetable in order to implement next steps within Wikidata, with the prospect of relating lexemes with concepts and meanings. We count on finishing the first phase by the end of 2022.
    By no means have we wanted to create lexemes as “empty shells”. We are working in a long-term project in order to provide valuable information for the sake of Aragonese language. We are working together with our Occitan counterparts (Lo Congrès) and in fact, we want to follow their example promoting further contributions from the community. Our reference is AitalDisem, a project initiated by Lo Congrès following its collaboration with Wikidata. This project is the direct continuation of the project AitalvivemBot. Aradgl (talk) 15:09, 25 March 2022 (UTC)[reply]
  • @Aradgl: I'll believe that you don't want to create empty shell lexemes, but I find it difficult to believe, given the prior examples of Russian, Estonian, Latin, and Hebrew lexemes, that they won't stay empty shells forever. If you are basing your work on the example of Aitalvivem, then (at least judging from that bot's contributions, which stopped in July 2019) you are not likely to be applying the right amount of attention to senses/resource linkages that would be desired, and (at least judging from the outcome of this bot request, from a user who disappeared after January 2020) you might disappear if prompted later about them.
You speak of wanting to add "valuable information for the sake of the language", but I fear that if there are no paths to this valuable information (with respect to the meanings of words) early on, then it is unlikely there ever will be such paths. If you are absolutely certain that existing printed/online references about Aragonese are not suitable/worthy of at least being linked to, and thus plan to essentially only crowdsource word meanings the same way the Occitan folks appear to have attempted, then what you could instead do (and what would change my opposition to a support) is have your system create lexemes only when an appropriate meaning has been added to that lexeme in that system by a community member, rather than creating lexemes with just the forms all at once waiting to be filled in on Wikidata. Mahir256 (talk) 15:37, 25 March 2022 (UTC)[reply]
  • @Mahir256: I'm the one who was supposed to continue the work about the AitalvivemBot. Unfortunately, I suffer since March 2020 from long covid and all my works has been postponed. But we still intend to add occitan lexemes in Wikidata, if it's something that you think can be useful. I thought that the purpose of Wikidata lexeme was to inventory words from languages. I never heard we needed to add senses to them as a mandatory requirement. Is that like this, now ? If it is, of course we wouldn't disturb the work done in Wikidata by uploading a lot of words without senses. Minority languages, indeed, don't have a lot of human and financial means and we can't move forward at the speed the main languages do (you see it with occitan, one person is sick and many works are postponed for years). Of course, we can't guarantee all the words we upload will be related to a meaning. But we intend to try with the poor means we have. In the other hand, all our words are from recognized dictionaries. Is that still interesting for Wikidata or will it be better if we keep them for ourselves ? Unuaiga (talk) 14:00, 28 March 2022 (UTC)[reply]
  • @Unuaiga: I'm sorry to hear that you have had long COVID this whole time—I sincerely hope you can recover! Please re-read my reply from 20:01, 24 March 2022 (UTC) above, and VIGNERON's comments below (in other words, you don't need senses if you can provide a way to add them later). Wikidata lexicographical data can do so much more than "inventory(ing) words from languages"; it's only appropriate that if more isn't done immediately after creating a lexeme, then opportunities for doing so (through the linkages of references) ought to be provided. My offer to find references re: Aragonese to Aradgl from 20:01, 24 March 2022 (UTC) above is extended to you re: Occitan. As for minority languages not moving as fast as main languages, I point you to the examples, in addition to Breton, of Hausa, Igbo, and Dagbani as under-resourced languages making lots of progress on lexemes. Mahir256 (talk) 14:23, 28 March 2022 (UTC)[reply]
    Thanks for your explanations. I will look ath the languages you talk about with great curiosity. Unuaiga (talk) 16:04, 28 March 2022 (UTC)[reply]
@Aradgl: this is a wonderful project but I have to agree with Mahir256, this doesn't seems ready yet (for Breton, after a ~4000 lexemes import and even with some info for the meaning, I estimated at least a year of manual work every week to have good lexemes :/ this is already painfull, 100,000 to 200,000 lexemes wouldbe overwhelming).
I have some additionnal questions :
  • what is the source ? and is it public or not ? (in both case, it would be better to indicate the source in the lexemes themselves)
  • is you bot ready yet ; if so, could you do some test edit (like creating 10 lexemes) so we can better see exactly what we are talking about and maybe provide some help.
Cheers, VIGNERON (talk) 13:23, 27 March 2022 (UTC)[reply]
@VIGNERON: It seems like the edits the requestor has been making in the Lexeme namespace of late resemble those described in this request. Mahir256 (talk) 16:09, 27 March 2022 (UTC)[reply]
@Mahir256: ah thanks, I looked at the bot edit but notat the account behind the bot ;) Indeed, these lexemes are way to empty to have any use. At the very very least, you need to add a source (and ideally, multiple). Maybe you can cross it with other dataset. I'm also wondering, why « between 100,000 and 200,000 » don't you have the exact number?
Also, I'm pinging @Fjrc282a, Herrinsa, Jfblanc, Universal Life: who speak Aragonese and might want to know about this Bot and maybe even want to help.
Cheers, VIGNERON (talk) 16:24, 27 March 2022 (UTC)[reply]
@Aradgl: Thoughts on VIGNERON's reply from 16:24, 27 March 2022 (UTC)? Mahir256 (talk) 20:14, 8 June 2022 (UTC)[reply]

PodcastBot[edit]

PodcastBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Germartin1 (talkcontribslogs)

Task/s: Upload new podcast episodes, extract: title, part of the series, has quality (explicit episode), full work available at (mp3), production code, apple podcast episode id, spotify episode ID. Regex extraction: talk show guest, recording date (from description) It will be manually run and only for prior selected podcasts. Code: https://github.com/mshd/wikidata-to-podcast-xml/blob/main/src/import/wikidataCreate.ts

Function details:

  • Read XML Feed
  • Read Apple podcast feed/ and spotify
  • Get latest episode date available on Wikidata
  • Loop all new episodes which do not exists in Wikidata yet
  • Extract data
  • Import to Wikidata using maxlath/wikidata-edit

--Germartin1 (talk) 04:38, 25 February 2022 (UTC)[reply]

  • Pictogram voting comment.svg Comment What is your plan for deciding which episodes are notable? Ainali (talk) 06:40, 21 March 2022 (UTC)[reply]
  •  Oppose for a bot with would do blanket import of all Apple or Spotify podcasts. ChristianKl❫ 22:46, 22 March 2022 (UTC)[reply]
    • Have a look at the code, it's only for certain podcasts and will run only manually. Germartin1 (talk) 05:12, 23 March 2022 (UTC)[reply]
      • @Germartin1: Bot approvals are generally for a task. If that task is more narrow, that shouldn't be just noticeable from the code but be included in the task description. ChristianKl❫ 11:39, 24 March 2022 (UTC)[reply]

How about episodes to podcasts with a Wikipedia article? @Ainali:--Trade (talk) 18:34, 12 June 2022 (UTC)[reply]

Pi bot 10[edit]

Pi bot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Mike Peel (talkcontribslogs)

Task/s: Create new Wikidata items for people with Commons categories

Code: Available on BitBucket

Function details: The code looks through commons:Category:People by name (and also commons:Category:Women by name and commons:Category:Men by name in a later version) to find categories about humans that don't have Wikidata items. It then searches Wikidata for the name to see if an item for that person might already exist, and skips it if there is a candidate item that hasn't been declined through the Commons category matches in the Distributed Game. Otherwise, it creates a new item, and adds the commons sitelink, instance of (P31)=human (Q5), and if available it also sets sex or gender (P21), date of birth (P569) and date of death (P570), all with imported from Wikimedia project (P143)=Wikimedia Commons (Q565) as a reference. That information is then shown in Commons through the Wikidata Infobox, which will hopefully lead to editors adding more information about the person in the future. Thanks. Mike Peel (talk) 22:10, 1 December 2018 (UTC)[reply]

Examples at Håkon Aase (Q59342509), Morten Aass (Q59342511), Maria Pavlovna Abamelik-Lazareva (Demidova) (Q59342592), Juan Pablo Abarzúa (Q59342596). This might run weekly. Thanks. Mike Peel (talk) 22:28, 1 December 2018 (UTC)[reply]
I am going to approve the bot in a couple of days provided no objections have been raised.--Ymblanter (talk) 20:45, 3 December 2018 (UTC)[reply]

{{Approved}}--Ymblanter (talk) 22:02, 5 December 2018 (UTC)[reply]

Reboot[edit]

I would like to restart this bot task. When I ran it after the bot approval in 2018, it was subsequently blocked by @Jean-Frédéric, Multichill: because it was "Mass creating items that fail Wikidata:Notability". The bot was unblocked per [3]. We subsequently had a discussion about this at Wikidata:Project_chat/Archive/2019/01#Creating_new_items_for_humans_based_on_Commons_categories, where there seemed to be general support for this task, but concerns about notability. The currently running RfC at Wikidata:Requests for comment/Creating new Wikidata items for all Commons categories also seems to generally support this work.

I still don't have a good solution for avoiding creating new items for people that the Wikidata community think are non-notable even though they have a Commons category. In those cases, the Commons community has either decided that they are notable, or haven't spotted the existence of the category. I think the workflow with handling these cases has changed since 2018: nowadays it's more accepted that the first step to resolve them is to nominate the images and category for deletion on Commons, and if those are deleted, then nominate the item here for deletion. So I think there's now a clear cross-wiki process for handling non-notable cases.

I think it's important to bring this data onto Wikidata: it benefits Wikidata directly for the notable person items that are created and linked to multimedia resources, and it benefits Commons with multilingual content and auto-categorisation. Thanks. Mike Peel (talk) 21:00, 22 January 2022 (UTC)[reply]

No, Wikidata:Requests for comment/Creating new Wikidata items for all Commons categories doesn't support this. In that RFC you asked the community what they think about "All Wikimedia Commons categories should have a Wikidata item" and the clear answer was: The majority of the Wikidata community  Opposes this.
A person having a category on Commons doesn't make them notable. You'll end up creating a ton of not notable items again. Multichill (talk) 23:28, 22 January 2022 (UTC)[reply]
I can only agree with Multichill here. In the opinion you mentioned, there is no consensus for creating data objects for people who have a category in Wikidata. Therefore I reject the commissioning of this bot.  Oppose --Gymnicus (talk) 08:40, 25 January 2022 (UTC)[reply]
Note that both people opposing here, also opposed in the RfC, without really giving good reasons. Read the other comments there, they are generally opposing the creation of items for combination categories, but there was more support for creating items for individual topics, including humans. In particular see the village pump discussion. Also, note that I'm not aware of any of the items created when I was running this bot before, having been deleted or otherwise having turned out to be controversial over the last few years. Thanks. Mike Peel (talk) 10:27, 25 January 2022 (UTC)[reply]

GretaHeng18bot, 1[edit]

GretaHeng18bot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Gretaheng18 (talkcontribslogs)

Task/s: The bot will create San Diego State University faculty profiles and create items for departments and colleges on Wikidata. It will also create relationships among SDSU colleges, departments, and faculty.

Code: Haven't established one for this project but will update it here: https://github.com/gretaheng

Function details: This bot is mainly used on the library faculty profile project. At here in Wikidata, it is used to create new items and add links of the new and existing SDSU faculty, departments, and colleges. --GretaHeng18bot (talk) 18:37, 21 January 2022 (UTC)[reply]

Discussion
  •  Support if by faculty you mean working groups or something other than a person that can be linked to.--So9q (talk) 14:31, 3 February 2022 (UTC)[reply]
 Support Prahlad (tell me all about it / private venue) (Please {{ping}} me) 15:17, 15 February 2022 (UTC)[reply]
Hi @So9q. I am working on creating wikidata items for departments, faculty (college/school), and faculty (professors) and will link them together (people link to the departments, departments link to the college). Now, I finished departments and faculty (college/school) part and am working on the data model for faculty (professors). Our library is collecting faculty's ORCID ID and I will add ORCID ID for faculty on wikidata once we have their ORCID ID. There are also some faculty works (books/papers) on wikidata that are not created by my institution. We'd like to change the creator information from string to wikidata number in the future. Hope this makes sense. GretaHeng18bot (talk) 19:08, 15 February 2022 (UTC)[reply]
Are random faculty members notable? I think only in the case they have published at least one scientific work. I hope that Wikidata will get full coverage of all the world’s scientific publications, and I would love to have more authors with orcid in WD, but only if they actually published something. Could you check if they published anything before importing?—So9q (talk) 03:57, 16 February 2022 (UTC)[reply]
Thanks for the concern. I think most of them have published some works, which I will verify. Since this project only focuses on tenured, tenure-track, and Emeritus faculty (I should have brought this up earlier), and publication is a requirement for the tenure status, I am confident on the publication.
We have ORCID ID project and this project going on the same time to try to build the foundations of scholarly communication. We plan to migrate an institutional repository that has linked data capabilities (setting up test trial now). Creating wikidata items for the faculty in the repository will allow us to link the faculty work to faculty's wikidata page in the future.
Let me know if you have further questions! Gretaheng18 (talk) 18:53, 18 February 2022 (UTC)[reply]
  • I will approve the bot in a couple of days provided no objections have been raised.--Ymblanter (talk) 20:00, 15 February 2022 (UTC)[reply]
  • Looking at the two first items I came across: Q110892606 and Q110892602, their part of (P361) statements seem problematic. Isn't there a better way to express this? --- Jura 12:44, 16 February 2022 (UTC)[reply]
    I borrowed some properties from Stanford University Library's data model for academic faculty/staff; they use Part of as well to express the relationship between faculty members (human not institution) and departments (https://www.wikidata.org/wiki/Wikidata:WikiProject_Stanford_Libraries/Data_models#Core_description). When I researched which property to use, there is an alternative more specific: academic appointment (P8413). But I think part of is more flexible. Some faculty may volunteer to work in a committee or a working group that is not appointed. They may also be a lab manager that they got grant for. Part of can be applied to all these cases. If you feel not comfortable by this general part of (P361) property, do you think affiliation (P1416) works, which is a subproperty of part of? Gretaheng18 (talk) 18:28, 18 February 2022 (UTC)[reply]
    Not sure where that page comes from. Once one tries to add the inverse has part or parts (P527), it seems fairly clear that it doesn't work well for organizations.
    As a general rule, if there is a more specific property available, that should be used. As I don't quite know what the link is between the values and all the persons you plan to add, I can't recommend one over the other. affiliation (P1416) is likely to work in most contexts. subproperty of (P1647) can help determine that, but I think it's use is somewhat limited in Wikidata and not necessarily conclusive. --- Jura 07:19, 20 February 2022 (UTC)[reply]
    Thanks for the suggestions. I will discuss the data model with my colleagues! Gretaheng18 (talk) 16:17, 21 February 2022 (UTC)[reply]
    I updated the data model here:https://www.wikidata.org/wiki/Wikidata:WikiProject_PCC_Wikidata_Pilot/San_Diego_State_University/SDSU_Institutional_Data_Project#Extended_description. Let me know if you have any questions. I will run a few tests over the weekend. Gretaheng18 (talk) 22:40, 4 March 2022 (UTC)[reply]

companyBot[edit]

companyBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: WikiFan2100 (talkcontribslogs)

Task/s:Updates ~1,000 wikidata items (all companies) with central index key, bloomberg ID, crunchbase ID, and official website properties. I checked and it should make 1,651 edits.

Code:https://github.com/emg89/wiki-update

Function details: this is a simple script that first checks if a company already has the property, and if not then it adds the property and value (as listed in the CSV file in the github repo), --WikiFan2100 (talk) 23:44, 9 January 2022 (UTC)[reply]

  • Seems similar to my request that I never followed up on (see Wikidata:Requests_for_permissions/Bot/BorkedBot_3). Definitely something I'd like to see done. I didn't do it because the joins turn out to be a little tricky to do automatically. In particular using just stock tickers can be unreliable because we don't always have good start times. Looking at your git repo you wrote that you looked all this up yourself? If so I don't think a bot approval is needed since it's really all manually done (i.e. not using code). I would like to see references added to indicate where you are getting this data from if possible. (Since you aren't using pywikibot you might want to have your bot emit quickstatement commands instead of directly editing via the API because then it is simple to add things like references.). BrokenSegue (talk) 02:32, 10 January 2022 (UTC)[reply]
  • Yes, this is from a database I have manually created for work, so it's very clean and accurate. I've always had the thought that I would like to contribute it to Wikipedia. Regarding Quickstatements, I wasn't familiar with that tool so I'll definitely take a look at that (thanks). Regarding references, I suppose all of the identifiers I am looking to add are kind of references themselves, as they can be appended to a url that will take you to the page for that company. For example, if you go to the wikidata page for Stripe and click on the CIK it will automatically take you to the SEC's page for Stripe thanks to the formatter url for the CIK property (same for crunchbase and bloomberg IDs). Also, definitely interesting what you were trying to do with BorkedBot_3, it would be great to have more of this information in wikipedia. I spend a lot of time on the SEC's website for work so I'll try to think if there is any way I can help out there. WikiFan2100 (talk) 01:02, 11 January 2022 (UTC)[reply]
    Cool. You can see the source code for my bot that I just never really ran very long here. Though don't run it as it as-is because it has known correctness issues. My main concern about references is "how do you know this CIK maps to the item" (as opposed to, say, a different item with the same name). If the answer is "I did it by hand" then no reference is needed. If you looked it up using another identifier (like the ticker) or used some joining technique then a reference would be helpful. Generally this all LGTM. We should approve this task. BrokenSegue (talk) 03:32, 11 January 2022 (UTC)[reply]
Ideally we would address Property_talk:P3242#scope_and/or_datatype before adding more of them, but then 1000 isn't that much. --- Jura 11:49, 18 January 2022 (UTC)[reply]

StreetmathematicianBot 2[edit]

StreetmathematicianBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Streetmathematician (talkcontribslogs)

Task/s: Use the Crossref API to turn author name string (P2093) statements into disambiguated author (P50) statements based on ORCID iDs.

Code:

Function details: This task does not create new items. It adds statements linking existing author items to existing article items.

Bot operation: Where

  • there is an item with a best DOI
  • there is another item with a best ORCID iD
  • Crossref API data associates the DOI with the ORCID iD
  • the existing items are not linked
  • all names match*: Crossref's first+family name, the author item's en: label, and the author name string of the article item
  • the positions of the author in the lists of authors also match

the bot will:

The Crossref API and the Crossref public data file are free to use for any purpose and its results are not covered by copyright, with the exception of abstracts (abstracts are not used for this task).

The data quality of the data provided by the Crossref API seems good enough to me to allow adding the new statements at normal rank.

I will perform test edits and link to them if there are no objections.

As the code currently stands, the bot will visit articles with several authors that can be disambiguated several times. This is suboptimal for articles with many authors, since it will result in lengthy edit histories.

Footnotes:

  • Precise name matching will be used for the first runs. If there are no problems, less-precise matching (missing periods after initials; accents and diacritics missing in one version of the name but not the other; names that differ only in hyphenation of their components) may be useful, but I'm happy to restrict this request to the precise matches and create another request for less-precise matching once I can give full details on how it will operate.
  • Whether author name string (P2093) statements should really be removed is currently being discussed on WD:PC. However, it should be pointed out that even if they are removed, the intention is that no information is lost in the process, and it will thus be possible to restore the author name string (P2093) statements at a later point automatically.

--Streetmathematician (talk) 19:45, 20 November 2021 (UTC)[reply]

sounds good to me BrokenSegue (talk) 02:02, 21 November 2021 (UTC)[reply]
5 test edits. Streetmathematician (talk) 08:13, 21 November 2021 (UTC)[reply]
@ArthurPSmith: you commented at WD:PC, but I hope it's okay to respond here:
  • There are many articles with several matching author name strings. Some of those are mistakes, others appear legit. I would suggest to skip such articles for now, since I believe they need human attention.
  • My plan is to start with exact name matches. After that:
    • punctuation, capitalization, diacritics, and whitespace changes are worth handling automatically, I believe, but care must be taken to preserve or add all variants in use
    • I'm not sure about name order and expanded names vs initials vs omitted names, which may cause a small number of false positives
    • stemming, Damerau-Levenshtein neighbors (typos, minor spelling differences) and further transformations: as suggestions for semi-manual edits only
  • Just to clarify, I'm not using data from ORCID, just ORCID iDs provided by Crossref and ORCID iDs already in Wikidata. Streetmathematician (talk) 07:37, 23 November 2021 (UTC)[reply]
  • @Streetmathematician: There are many subtleties in name-matching, and some previous bots doing this sort of thing have done it poorly. Examples of issues beyond punctuation/hyphenation/capitalization are: (1) Handling of suffixes: Jr, III, etc., (2) The source for many author name strings (pubmed I think) often reverses the name so last name is first, initials strung together after, eg. "Smith AP". (3) Spanish and other names with multiple "family name" components, with one source having only one of the family names as "last name" - "Jose Garcia Hernandez" may need to match a last name of "Garcia" for example. (4) Chinese and other names where the family name is often first, but in western scientific publications often reversed as an author name, and the given name is often two syllables separated by a hyphen, or sometimes joined together depending on the source: "Wang Wei-Min" might be also "Wei-Min Wang", "Wei Min Wang", "Weimin Wang", 'Wang WM', 'W.-M. Wang', 'W. Wang', 'W-M. Wang', etc. etc. Anyway, if this request is sticking to just exact matching of the name string and avoiding cases where there are multiple matches then that should be fine for now; it will certainly be a big help to start with and we can revisit the other issues later. ArthurPSmith (talk) 15:58, 23 November 2021 (UTC)[reply]
    Thank you. I agree that matching names is very difficult. I'm proposing to do it only as a safety check to catch the (regrettably common) case in which our sources are inconsistent (so it's "is it implausible those two names refer to the same person" not "here are two lists of a million names, find out who's who"). Nevertheless, I would like to amend my original proposal to be restricted to exact matches only. Also, I've come across a few articles with very many authors, and I'd also like to leave those for later. Streetmathematician (talk) 16:53, 23 November 2021 (UTC)[reply]

@Streetmathematician: This seems to be stale, is this still active? Perhaps @Ymblanter, Lymantria: could comment? Thanks. Mike Peel (talk) 22:18, 18 January 2022 (UTC)[reply]

The plan seems good to me, I would like to see some more test edits, say 100. Lymantria (talk) 06:17, 19 January 2022 (UTC)[reply]

ConferenceCorpusBot[edit]

ConferenceCorpusBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: WolfgangFahl (talkcontribslogs)

Task/s: Import Scientific events and series from diverse sources e.g. dblp

Code: https://github.com/WolfgangFahl/ConferenceCorpus

Function details: The intended functionality builds on the preparations done by Simon Cobb using OpenRefine based on the discussions in:

https://github.com/SmartDataAnalytics/OpenResearch/discussions/127

So far some 4000 event series have been imported this way and linked to dblp, Microsoft Academic Knowledge Graph, GND and so on.

See https://confident.dbis.rwth-aachen.de/dblpconf/wikidata for a list of event series available in wikidata.

The list of Scientific Events is unfortunately not as complete and this bot intends to remedy the situation. In the end there should be proper scientfic event / proceedings pairs instead of the many proceedings entries that do exist but do not have a link to the corresponding conference. See https://github.com/SmartDataAnalytics/OpenResearch/discussions/162 for an overview or https://diagrams.bitplan.com/render/svg/0x589f1ec0.svg

--WolfgangFahl (talk) 15:42, 6 November 2021 (UTC)[reply]

What do you want to import? conferences, individual editiona of conferences, or articles published there?--GZWDer (talk) 15:57, 6 November 2021 (UTC)[reply]
  • this request feels a little vague. are you creating new items? modifying existing items? can we see a sample? BrokenSegue (talk) 16:44, 6 November 2021 (UTC)[reply]

Examples[edit]

Today I imported some events and corresponding proceedings with openrefine: see https://scholia.toolforge.org/event-series/Q64852380 and my contribution page https://www.wikidata.org/wiki/Special:Contributions/WolfgangFahl The sources are: https://confident.dbis.rwth-aachen.de/or/index.php?title=K-CAP where you'll find pointers e.g. to https://dblp.org/db/conf/kcap/index.html. We intend to create a bot that does similar work without using openrefine and a better quality by doing better matching against different sources/references.

See also Proceedings Title Parser for a matching source we used until recently that only matches via acronyms. The corresponding event series can be found at: https://confident.dbis.rwth-aachen.de/or/index.php?title=TPDL and https://scholia.toolforge.org/event-series/Q5412433

The goal is to complete the event / proceedings pairs and link them via P4745 - is proceedings from

New items will be created when missing and existing items amended if e.g. library references of k10plus, GND, dblp are missing. There is a core set of information that will be used as an "event signature":

  • acronym
  • year
  • title
  • location
  • country
  • starttime
  • endtime
  • homepage

and those are mapped to the appropriate wikidata properties as shown in the examples. WolfgangFahl (talk) 15:59, 9 November 2021 (UTC)[reply]

  • ok. sounds like the bot isn't implemented yet? are all the details ironed out? is the source anywhere? do you know how many edits you'll be making? by the way you should actually register the Bot user account User:ConferenceCorpusBot so nobody can steal it (and probably make the test edits under that account in the future). generally this seems like a good bot though. BrokenSegue (talk) 23:24, 9 November 2021 (UTC)[reply]

@WolfgangFahl, BrokenSegue: This seems to be stale, is this still active? Perhaps @Ymblanter, Lymantria: could comment? Thanks. Mike Peel (talk) 22:17, 18 January 2022 (UTC)[reply]

Please stay tuned see e.g. WEBIST OpenResearch Series versus WEBIST Series in Wikidata (Scholia) versus WEBIST series information from different datasources versus WEBIST from different datasources (JSON) Currently the process is still semi-automatic and we are using OpenRefine for filling wikidata. We need to implement the Backend now to make sure the Bot functionality will be achieved.WolfgangFahl (talk) 06:39, 19 January 2022 (UTC)[reply]
  • Support but I recognize that more documentation and community review is helpful. I invite this tool and its community to Wikidata:WikiProject Events, where anyone can show examples, host discussion, and build out documentation. There is already a data model for conferences documented there, so if the tool can follow that model, then use this as a community hub for explaining the too. I posted some guidance for next steps at Wikidata_talk:WikiProject_Events#Bot_for_importing_conference_data. Somehow we need to see and discuss some test data to proceed and again, I suggest that the Events WikiProject is an appropriate place for central conversation. Let me know if I can assist with organizing conversation and review. Bluerasberry (talk) 18:04, 28 January 2022 (UTC)[reply]
    Yes please. Today i started a semi-automatic import of https://scholia.toolforge.org/event-series/Q105692764 which is an interesting example since it's incomplete in dblp. WolfgangFahl (talk) 08:50, 20 April 2022 (UTC)[reply]

@Bluerasberry, WolfgangFahl: A couple dozen events have been imported using a semi-automated approach in the meantime. The bits and pieces of the infrastructure are now available. How do i proceed from here?--WolfgangFahl (talk) 06:42, 28 May 2022 (UTC)[reply]


TAMISBot[edit]

TAMISBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: ChristianBRoy (talkcontribslogs)

Task/s: TAMISBot will import data about books using the WikiProjet Livres model.

Code: n/a

Function details:

  • The bot parses data from an ONIX data source (ONIX is the metadata standard for the book industry)
  • The data from the ONIX source is provided by the book publishers, who gave their agreement for data usage and upload to Wikidata
  • The bot tries to find existing Wikidata items based on ISBN, title and author
  • A human operator reviews the matches (and non matches), makes corrections if needed, and confirms that the bot can do its job without creating duplicates
  • The bot creates new items when needed, or adds claims to existing items
  • It does not duplicate claims when they already exist
  • The created or updated items are all written work (Q47461344), version, edition, or translation (Q3331189) and author (Q482980)

--ChristianBRoy (talk) 13:33, 27 August 2021 (UTC)[reply]

  • How many new items? What about which books? --- Jura 13:58, 30 August 2021 (UTC)[reply]
    • As an estimate, on the short-mid term (end of 2021 or beginning of 2022): potentially approx. 500 books (I believe 95%+ will be new items), two editions each (one paper and one ebook edition) so ~1000 more new items for editions, and one or two authors each book (a lot of the author already exist, so that part should not create that many items). The books are published by Canadian publishers, all of whom publish in French. They include publishers of fiction (for instance Éditions du Boréal (Q3579629), Éditions Alto (Q16684371) and Hurtubise (Q3579355)), non-fiction (Éditions du Septentrion (Q3579664), Q3122128), and a university press (Presses de l'Université de Montréal (Q21428306)). --ChristianBRoy (talk) 18:06, 30 August 2021 (UTC)[reply]
    • How are they selected? We already had a proposal for Quebec depôt légal import, but Wikidata isn't quite equipped for that. --- Jura 22:28, 30 August 2021 (UTC)[reply]
      • Books will be related to Québec City (either because of their author or publisher, or because the content of the book is linked to the city by main subject (P921) or narrative location (P840)). The motivation is the city's recognition as a City of Literature (Q3467764), part of the UNESCO Creative Cities Network. The scope is thus a lot smaller than the dépôt légal. --ChristianBRoy (talk) 13:16, 31 August 2021 (UTC)[reply]
      • I'm not really convinced by using the publisher as criterion .. essentially it would be the depot legal of a given publisher. The idea to build a comprehensive bibliography about Quebec City (topic or narrative location) seems more interesting. --- Jura 11:46, 1 September 2021 (UTC)[reply]
      • Understood. At the same time, having local publishers is part (in my understanding) of the UNESCO criteria. We could begin with "important" books published by those publishers... for instance, Nikolski (Q3341721) is important because it was a game changer for Éditions Alto (Q16684371) during their launch. In that case our bot would add links between the book and the publisher (currently there is no reference at all), and would create version, edition, or translation (Q3331189) items in order to make the ISBNs known in Wikidata. We could also chose to limit our projet to books written by authors who already have an item on Wikidata (in which case we would just make sure to add notable work (P800) claims to the author). --ChristianBRoy (talk) 13:48, 1 September 2021 (UTC)[reply]
        • I don't see how UNESCO is relevant to this bot request. There are other websites such as Worldcat, that aim to include all books. notable work (P800) is not to list every edition or work of an author. --- Jura 05:37, 12 September 2021 (UTC)[reply]
        • Sorry about the UNESCO reference, this was a follow up to my answer to your previous question about the selection of books. I understand that notable work (P800) is not to list every edition or work of an author. However, it can be used to list works that are notable, and some are clearly missing in Wikidata. For instance, Jacques Lacoursière (Q674235) is a famous historian, who has received multiple awards in Canada and in France, but there is no entry about his most notable works (which we would add). ChristianBRoy (talk)

@Jura1: I think you see it the wrong way. This user, ChristianBRoy, has a project with partners that are publisher in Québec. He has access to valuable and verified data from the books these publishers make. Also, using AI and manual work, this project is looking to add to books informations that are usually very hard to come by, like main subject (P921), award received (P166), based on (P144), inspired by (P941), characters (P674) and narrative location (P840) and that we lack on WD. At least 50% of his books will already be on WD, so he will bonifying existing WD elements for most of his work.

I'm the one who recommended that he creates a bot for his automated modifications. I'm not really sure I understand the reservation you are expressing here. Regards, Antoine2711 (talk) 20:08, 13 September 2021 (UTC)[reply]

  • What guidance on notability did you you provide? --- Jura 09:20, 14 September 2021 (UTC)[reply]
First of all, all the publishers are ALREADY in Wikidata. There's 10 or 15, and the maximal scope in the next 10 years is the publishers in Quebec, so maybe 1000. But he's is far from there. His current project, limited to 10 publishers now in Quebec City, is about bonifying data for books published by these publishers. So the notability here is clear. I take for granted here that everybody knows that a publisher publishes books, so putting informations about book for existing publisher is pertinent in Wikidata. He will add data to already notable Wikidata publishers. I think the fact he use IA to extract and identify properties data’s not often defined in books, is also in itself a good reason to welcome this bot and encourage ChristianBRoy to be part of the Wikidata community and be a good and reliable contributor. I know I'm doing my best to show him what I've learned myself here. Can you also support him? Antoine2711 (talk) 01:41, 22 September 2021 (UTC)[reply]
  • So your guidance would be that all books and editions are notable if we have items about their publisher? --- Jura 22:23, 26 September 2021 (UTC)[reply]
@Jura1: I'm not saying any book these publishers did are automatically notables. What I say is the publisher are already notable, because they exist in Wikidata. Publisher get their notoriety from books they publish, and this project with add informations about these, specially information that is hard to get but useful for the public. Antoine2711 (talk) 02:02, 29 September 2021 (UTC)[reply]
@Jura1: Actually, my understanding of Antoine's comment is that he refers to the structural need notability criteria. By linking books to publishers (and to authors as well), we make the statements about them more useful. We will also very likely improve the overall information about books, by including ISBNs in editions and linking them to works. For instance, Q3207769 is a work with an ISBN, which is not structurally correct. We would create the edition for the ISBN and correctly link it to the work. Furthermore, my understanding that books meet Wikidata's notability critera is based on the fact that they are instances of "clearly identifiable conceptual or material entity". ChristianBRoy (talk)
  • Yeah, I see your point of view, but I don't think that's the way WD:N is generally understood. Otherwise we would end up having every book and edition for larger publishing groups. The only books that you wouldn't consider notable would be the self-published ones. There are various databases for ISBNs, maybe try these instead? --- Jura 13:37, 25 October 2021 (UTC)[reply]
The other ISBN databases do not have the same potential reach for the greater public, I believe. Also, as far as I know, none will offer the same possibility to easily link a work to a location or a character, and then do queries around those. And as I said, the idea is not to dump a huge list of ISBNs, but rather a human curated list of a few hundred works. In complement, I am curious as to what makes "having every book and edition for larger publishing groups" not a suitable option? Is that mostly a Wikidata performance / limited ressources concern, or are there editorial reasons for this not being interesting? (honest question, not a trap, I just want to have a better understanding of what makes contributions interesting or not). ChristianBRoy (talk) 16:04, 17 November 2021 (UTC)[reply]

@ChristianBRoy: This seems to be stale, is this still active? Perhaps @Ymblanter, Lymantria: could comment? Thanks. Mike Peel (talk) 22:16, 18 January 2022 (UTC)[reply]

Difficult one. I have doubts about adding editions too easily. Lymantria (talk) 06:38, 19 January 2022 (UTC)[reply]
@Lymantria I could change the bot behaviour and remove the editions feature, it that is a concern. However, this would be contrary to the model used by Wikidata:WikiProject_Books. Moreover, editions are useful for ISBN's (which in turn are very important identifiers for books). Hence the dilemma, I guess. ChristianBRoy (talk) 13:28, 19 January 2022 (UTC)[reply]
Thanks for asking @Mike Peel! Yes it is still active from my point of view. There is still interest in proceeding. ChristianBRoy (talk) 13:21, 19 January 2022 (UTC)[reply]
  • Staff had to plan for deleting of items created too easily for scholarly articles.
Given the somewhat alarming status about Query Service at End of 2022, I don't see how would have space to mirror ISBN registries/OPACs here.
If it's Wikibase's technology that interests Quebec's National Library, it's possible to set up separate instances of Wikibase on its server. --- Jura 14:39, 19 January 2022 (UTC)[reply]
@Jura1 for sake of clarity, this bot is not at all affiliated or related to the Quebec's National Library, sorry if somehow my comments created some confusion about that. That being said, I understand from your comment regarding the query service that your concern is the number of new Wikidata items that would be created, is that correct? If so, what would be a reasonable number? I also understand from your comment that you do not see the interest of mirroring ISBN registries here... are you also saying that there is no interest at all in having books on Wikidata? Otherwise, would my suggestion above, to not create version, edition, or translation (Q3331189) items, make sense? ChristianBRoy (talk) 21:10, 21 January 2022 (UTC)[reply]
  • The same can be run by other organizations.
On Wikidata, one needs to follow WD:N. This is not met by excluding self-published books. --- Jura 10:20, 22 January 2022 (UTC)[reply]
From WD:N, I understand that any book from an author that has a Wikipedia page or Wikidata item is notable. I feel that may be a bit too much (notable authors may have written non-notable books), but at least it is an objective criteria that I can code in the bot. The human operator for the bot would pick books based on their overall interest, but the bot would block uploading any book that does not have a link to an existing Wikipedia or Wikidata author page/item. It that sounds good, we could proceed to a test run for a small sample of books that could be reviewed (as per the approval process on Wikidata:Bots). ChristianBRoy (talk) 14:35, 25 January 2022 (UTC)[reply]

Given the long discussion here, I am trying to figure out ways to move forward... Is there anything I can do to help? My last suggestion (a test run) still holds, and I am open to other suggestions as well! --ChristianBRoy (talk) 18:46, 3 March 2022 (UTC)[reply]

@Mike Peel, Jura1, Lymantria: I have a problem here. We have a user that follow the recommandation, and come here with a reasonable request to be granted the bot status. He says he's going to curate a thousand books here on wikidata per year, saying half of them already exists, and his 10 publishers already exists. So, it's mainly enriching, certainatly not database dumping. Also, he says that ALL his data will be curated by a human, which also implies data fo a high quality. Now, Jura1 starts pretending he wants to drop hundreds of thousands of books on Wikidata, at that will create a technological problem or rather put pressure on an already concerning issue of item creation. But nowhere did he or me who came to explain his project, did we say such a thing. This project is perfect for Wikidata. It's enhancement of a small dataset of very particular data, cleaned up by human hand. We should welcome him instead of hitting him in the leg that we have been doing for the last 8 month. Or maybe I missed something, but I tell you, I've been around Wikidata and the Wikimedia Fondation for the last 4 years, so I did my homework. So, what do you need for this request to go forward? Regards, Antoine2711 (talk) 03:33, 27 April 2022 (UTC)[reply]


@Mike Peel, Lymantria: So, this user said he will modify & create 1000 wikidata elements over a period of a year. Could we start a test like we did with my bot, and have ChristianBRoy modify & create a hundred of books on Wikidata and test his robot? Me, I am an OpenRefine user, but his robot is really going to works without a human direct intervention. What would be the next step? Regards, Antoine2711 (talk) 02:57, 9 June 2022 (UTC)[reply]

AmmarBot 4[edit]

AmmarBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Ammarpad (talkcontribslogs)

Task/s:Import data for number of pages (P1104) from English Wikipedia

Code: book_data.py

Function details: The bot will periodically iterate through the pages using the {{Infobox book}} template on English Wikipedia and primarily attempt to extract the page number value if it exists and is valid. It will then add it to the corresponding data item of the page. If the page has no data item it will be skipped. Similarly if the value is not valid. Additionally where either ISBN-13 (P212)/ISBN-10 (P957) or OCLC control number (P243) exists (or both), are not already on Wikidata, and are valid there'll be imported too. This script is also is written as part of Outreachy program work and my mentor is Mike Peel. --Ammarpad (talk) 10:56, 19 July 2021 (UTC)[reply]

  • what if the item isn't an instance of a book? can we see sample edits? BrokenSegue (talk) 15:36, 19 July 2021 (UTC)[reply]
    I think number of pages (P1104) is not limited for books only; the doc says "number of pages in an edition of a written work", and written-work comprises many things beyond bounded books. I will do some some test run now. Ammarpad (talk) 09:57, 20 July 2021 (UTC)[reply]
    The test run is completed. Ammarpad (talk) 11:05, 20 July 2021 (UTC)[reply]
  • I don't understand how it can apply to a written work. the same work can be published in multiple editions with different page counts.... BrokenSegue (talk) 23:38, 20 July 2021 (UTC)[reply]
    @BrokenSegue: Different editions have different items here, and the enwp article should be linked to the correct one if appropriate. If not, the enwp article should be moved to the correct item, along with content about that specific edition. Thanks. Mike Peel (talk) 09:47, 21 July 2021 (UTC)[reply]
    @Mike Peel: right so we shouldn't be adding page numbers to literary works. BrokenSegue (talk) 14:57, 21 July 2021 (UTC)[reply]
    @BrokenSegue: There seems to be a disconnect here, your follow-up wasn't what I was meaning. Can you expand more please? Thanks. Mike Peel (talk) 19:53, 21 July 2021 (UTC)[reply]
    @Mike Peel: Different editions have different items. Only editions of works have page counts. If a wikipedia article is linked to a literary work item but has a page number we shouldn't import the page number (works don't have page numbers, editions do). Because in this scenario wikipedia article is likely confused or linked to the wrong item. Either way don't import. BrokenSegue (talk) 20:50, 21 July 2021 (UTC)[reply]
    Thank you BrokenSegue, I am working on this now I will restrict the script to instances of book. Ammarpad (talk) 11:32, 24 July 2021 (UTC)[reply]
    Implemented this. Now restricted to instances of book (Q571). Ammarpad (talk) 06:12, 29 July 2021 (UTC)[reply]
    Sorry, book is probably too restrictive. You want to include other sensible things number of pages (P1104) supports like version, edition, or translation (Q3331189). Also, the code link above no longer works. BrokenSegue (talk) 15:58, 29 July 2021 (UTC)[reply]
    I replaced the broken link. I also added version, edition, or translation (Q3331189) as supported instance. Ammarpad (talk) 07:42, 30 July 2021 (UTC)[reply]
    • I share BrokeSegue's concerns. Wikipedia is unlikely to have articles on (different) editions of the same work (except the Bible). --- Jura 08:11, 7 August 2021 (UTC)[reply]
      • Thanks, I believe I have already made changes to the script based on his feedback above. Ammarpad (talk) 21:53, 8 August 2021 (UTC)[reply]
        • Can we see a new test run then? The one above still adds it to things like written works. Maybe adding it as qualifier to ISBN numbers instead could be a solution. --- Jura 09:53, 9 August 2021 (UTC)[reply]

@Ammarpad: Is this bot request still active, or should it be archived? Thanks. Mike Peel (talk) 22:12, 18 January 2022 (UTC)[reply]

RonniePopBot[edit]

RonniePopBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: RonnieV (talkcontribslogs)

Task/s:

  • Provide the population numbers of regions, municipalities, places,... retrieved from the national Office of Statistics (INSEE, ONS, CBS, Statistični urad,...) to help Wikipedia's (and others) to show as actual numbers as possible, but also historical numbers.
  • Updating (yearly) number of boarding train passengers from Belgian railway stations.

Code:

Function details: I, as operator, retrieve (files with) population numbers from national Offices of Statistics. I will have a look at the things we can use from these numbers, and if possible, I will tell RonniePopBot to provide this information to Wikidata. RonniePopBot will run through these numbers and if they are not yet available, it will be added. A minimal set will exist of population (P1082), with point in time (P585) and source. determination method (P459) and criterion used (P1013) will be provided when available.

The script will try to find a unique Wikidata-item using a national (regional) code from the national Office. If these are not yet available in Wikidata, a table which links codes to Q-id's will be made and provided to the script. When doubles are found in Wikidata (most likely: subsequent municipalities), the script will not act, but report. If These numbers will be added later, after replacing the code with the right Q-id for the given moment.

The first run will be to add the population numbers from 1968 till 2016 from INSEE for French municipalities. --RonnieV (talk) 10:06, 4 July 2021 (UTC)[reply]

RonniePopBot has made a bunch of changes.
[4] is reported as four changes (all at about the same moment), but consist of the adding of a number of habitants, giving it a normal preference, adding a point in time (P585), a determination method (P459) and a criterion used (P1013). In the next edit, a source is given, consisting of an reference URL (P854), a publisher (P123), a retrieved (P813) and a CELEX number (P476).
 Support - We will use this updated info in infoboxes on nl-wiki, and from WikiData more languages can benefit from updates. Edoderoo (talk) 09:44, 8 July 2021 (UTC)[reply]
  • Ronnie asked me to have a look at this. Before we go over to the technical part: Adding historic population numbers to items will make these items a lot larger. I'm not sure what the current consensus is. Store the current number in population (P1082) and the full set of historic numbers in tabular population (P4179)? Last time I checked no consensus existed for importing all historic data and that's why we didn't import all the CBS data yet. I do notice that plenty of people went ahead and imported historic data anyway (with various quality issues). Probably best to first get consensus (or linking to it) before doing any mass imports? If you're up to it than we can do a pilot with publishing some of the CBS data to Commons and linking to it from Wikidata. Multichill (talk) 16:31, 15 July 2021 (UTC)[reply]
Thanks, Multichill, for pointing to tabular population (P4179). I was not aware of that property. I found many examples of items in Wikidata where population (P1082) (and other values) are stored as different values, each having there own, complete, source. I thought that that was the way to go. And as indicated in the description: use tijdstip (P585) as qualifier; use preferred rank for the most recent total value, so having multiple values, for different dates, should not be a problem. I see AVMbot adding yearly updates, like March 2021 and January 2020 for French communities. They reset the preferred value, so that looks fine. The population of Canada has more than hundred values, going back the early 17th century.
Commons example of 2011

See source Wikidata query and sources.

For all(?) French communities, we have graphs like this one stored in Commons, once made by Michiel1972 Most of these are not updated after 2011 (having data up to 2008). The information on the Commons-page says that information from INSEE is used, but there is no specific source indicated. As the INSEE now provides yearly data, we could recreate all these images each year. Just for French, there are over 30.000 municipalities. And there are many more places, municipalities, countries, regions,... that have population numbers and regular updates. Serving this with generated graphs, like described in Template:Graph:Lines, would require just the adding of 1 number and date (preferably with a source) to have an updated graph.
The same graphs can be obtained from a table in XML-format, but would require an XML-formatted file like this one to exist for each municipality, place, region, country,... Files like these are clearly intended for automated use, not for humans to edit. Each time new numbers become available, it would require to update the table, and to replace the value, sources and other information in the population (P1082). A human editor wanting to add (or adjust) a value might find more troubles editing that file in the correct way.
Once I get the example beneath working, in a way that we could easily indicate which table to use (preferably based on the link to Wikidata available on the Wikipedia article, completely language free, maybe by retrieving the tabular population (P4179) value and using it as the table name, I am fine with that. (Is solved) A naming convention which is fool proof, preferably language independent and which gives unique values for all places, municipalities, regions, countries,... in the world, would be needed.

See or edit raw graph data.

I was not aware of any issues with the use of population (P1082) for multiple values (and dates), as long as the most recent is indicated as preferred. Going for that option would make this bot require a Commons bot flag as well, to create thousands of data files there. It will require less changes in the first run, but might require more in the future (having to update the Wikidata value, with requirements and sources and having to update the data file). Can you point me to any discussion where the use of tabular population (P4179) in combination with population (P1082) is discussed, so I can join it? Thanks, RonnieV (talk) 00:51, 16 July 2021 (UTC)[reply]
I got it working with a data-file, so that problem seems to be solved. But what causes graphs to be shown double? Solved now. Does one of the options have a big advantage in performance?
For another page, I did try to have the actual COVID-numbers of all European countries on one page, but that led to a memory error. Would this be solved when only the actual numbers would be in the corresponding Wikidata-item, with all historic data (daily updates!) in a data file? If so, that might be of support for using more data-files. Thanks, RonnieV (talk) 14:50, 16 July 2021 (UTC)[reply]
Double graphs where shown by a doubling of code in Template:Graph:Lines by Bouzinac. I removed part of the code, so more problems with that. RonnieV (talk) 15:43, 16 July 2021 (UTC)[reply]
Hi, thanks for correcting the typo. What's the problem with Canada ? Checking the main source https://www150.statcan.gc.ca/n1/pub/98-187-x/4151287-eng.htm#table1 , it appears some old years does not have to be comprehended as whole Canada on that year.

See source Wikidata query and sources.

Bouzinac💬✒️💛 20:20, 16 July 2021 (UTC)[reply]

@RonnieV: This seems to be stale, is this still active? Perhaps @Ymblanter, Lymantria: could comment? Thanks. Mike Peel (talk) 22:09, 18 January 2022 (UTC)[reply]

Hi Mike Peel, thanks for your reminder.
I would like to continue adding the population for French and Slovenian places, and others in the future. The suggestion Multichill gave for tabular population (P4179) is a nice one, and I would like to investigate that in the future (later this year). For the time being, allowing me (RonniePopBot) to bring the data for France to population (P1082), as done by many for many places in France and other countries, would be a great win for the Dutch Wikipedia, at least the remaining 8k+ numbers (for about 2000 not yet updated wikidata-items) with INSEE-numbers between 16063 and 57645 would be a great win.
If we decide to prefer tabular population (P4179), I got no problem with bringing all current info for population (P1082) to text files and (after uploading these files to commons) linking these files in tabular population (P4179) and deleting all but the most recent value from population (P1082).
I have to dive into it to find out what I did in July (getting things to work with a data file), but I will use that for new addings above INSEE 57645, once I have the relatively small missing numbers inserted. (Currently, places with INSEE below 16063 on the Dutch Wikipedia use graphs with the wikidata-information as recent as 2018, 2019, while all other places use old PNG's, mostly with data up to 2008.)
Thanks again for getting this under attention, RonnieV (talk) 22:48, 18 January 2022 (UTC)[reply]
I also would like to update the values of daily patronage (P1373) for the Belgian railway stations, like Q1881203#P1373, with the most recent information of the NMBS counting. I could run this script using my normal account, but using an approved bot-account would both make more clear I used a script for this edits and help me separate handmade edits and bot edits. Thanks for considering my request for permission. RonnieV (talk) 00:24, 27 January 2022 (UTC)[reply]
What is needed to get a decision on this request? I don't know what I can do to get this to a decision. Population for French is now partly filled in, population for Slovenia does need an update, railway users for Belgium can use an update. I am willing to invest time in this, but would like it to run with a bot account, not with my main account. RonnieV (talk) 11:44, 2 February 2022 (UTC)[reply]
I would suggest going to Wikidata:Project Chat to discuss the issue with the population import. If there is consensus to do it, we can then quickly get the bot approved.--Ymblanter (talk) 20:09, 3 February 2022 (UTC)[reply]
Thanks, @Ymblanter. I started a chat at the project chat. Your input is appreciated. Thanks, RonnieV (talk) 17:38, 6 February 2022 (UTC)[reply]
The responses at the chat were not very unanimous. I will try and find some time to get this rolling with P4179 for a few communities, to see how this behaves and if P1082 could be completely empty (how will calls to P1082 be handled if P1082 is empty, but P4179 is available with multiple values (over time). Will it use the first line, the last line, look for the most recent itself?
Another issue is how to name these files in a systematic way. The INSEE value could be a good starter, but is not always related to just one WD item. I have seen communities with 2, 3, 4 WD items, due to changes over time. I also saw communities disappear and return. Something like 'Population FR <INSEE> <most recent community name>.<file format extension>'?
Implementing the end and restart dates in P4179 files could be handled, but care should be taken, as a community with a population count on 1 January 1970 and ending ultimo 1972 will not have a steady decline in its population in this three years. Repeating the 1970 value as an estimate on 31 December 1972 and a zero value on 1 January 1973 could help. A resurrection on 1 January 1974 should be preceded by a zero-value on 31 December 1973 (the day before). But if the community has a decline (of growth) between 1970 and 1976 (both census years) and no other values are available, which values should be uses for ultimo 1972 and for beginning of 1974? Pretending an even decline/growth and giving both values as estimates?
I will dive into these issues, RonnieV (talk) 15:41, 4 March 2022 (UTC)[reply]