Shortcuts: WD:RFBOT, WD:BRFA, WD:RFP/BOT

Wikidata:Requests for permissions/Bot

From Wikidata
Jump to: navigation, search
Wikidata:Requests for permissions/Bot
To request a bot flag, or approval for a new task, in accordance with the bot approval process, please input your bot's name into the box below, followed by the task number if your bot is already approved for other tasks.


Old requests go to the archive.

Once consensus is obtained in favor of granting the botflag, please post requests at the bureaucrats' noticeboard.

Translate this header box!

Contents

Bot Name Request created Last editor Last edited
Botcrux 3 2017-11-22, 10:38:48 Horcrux92 2017-11-22, 10:38:48
KabouterBot 2017-11-17, 22:48:44 K175 2017-11-22, 13:19:11
CyclingInitBot 2017-11-19, 11:47:48 Ymblanter 2017-11-19, 11:47:48
NIOSH bot 2017-11-14, 05:59:08 Ymblanter 2017-11-19, 11:51:46
Citationgraph bot 2017-11-14, 05:40:01 ArthurPSmith 2017-11-20, 00:59:31
AliciaFagervingWMSE-bot 15 2017-11-10, 07:38:51 151.177.36.105 2017-11-19, 19:06:08
ReosarevokBot 2017-11-08, 20:49:36 Lymantria 2017-11-12, 20:02:35
AndreCostaWMSE-bot 18 2017-11-07, 14:33:54 Lymantria 2017-11-12, 07:15:43
AndreCostaWMSE-bot 17 2017-11-07, 14:28:13 Ymblanter 2017-11-10, 13:02:03
AndreCostaWMSE-bot 16 2017-11-06, 19:45:45 Ymblanter 2017-11-09, 12:31:23
Revibot 4 2017-11-06, 16:02:52 Ymblanter 2017-11-09, 00:00:30
AndreCostaWMSE-bot 15 2017-11-06, 13:40:36 Ymblanter 2017-11-09, 00:02:44
AndreCostaWMSE-bot 14 2017-11-03, 15:18:10 Lymantria 2017-11-07, 06:46:50
AndreCostaWMSE-bot 13 2017-11-03, 11:35:35 Lymantria 2017-11-07, 06:46:44
AndreCostaWMSE-bot 12 2017-11-03, 11:22:12 Lymantria 2017-11-07, 06:46:34
AndreCostaWMSE-bot 11 2017-11-03, 11:15:52 Lymantria 2017-11-07, 06:46:28
AliciaFagervingWMSE-bot 14 2017-11-03, 09:17:26 Lymantria 2017-11-07, 06:46:21
AliciaFagervingWMSE-bot 13 2017-11-03, 08:34:13 Lymantria 2017-11-07, 06:45:10
EmptyBot 2017-11-02, 16:50:37 Ymblanter 2017-11-04, 03:47:46
AliciaFagervingWMSE-bot 12 2017-11-01, 15:16:10 Ymblanter 2017-11-05, 04:08:12
AliciaFagervingWMSE-bot 11 2017-11-01, 14:53:46 Ymblanter 2017-11-05, 04:06:40
AndreCostaWMSE-bot 10 2017-11-01, 13:31:49 Ymblanter 2017-11-05, 04:05:12
AliciaFagervingWMSE-bot 10 2017-11-01, 12:37:42 Ymblanter 2017-11-05, 04:03:22
AndreCostaWMSE-bot 9 2017-11-01, 11:04:46 Ymblanter 2017-11-04, 03:50:00
AndreCostaWMSE-bot 8 2017-10-31, 20:38:21 Ymblanter 2017-11-03, 05:54:19
AndreCostaWMSE-bot 7 2017-10-31, 20:20:58 Ymblanter 2017-11-02, 21:23:06
AliciaFagervingWMSE-bot 9 2017-10-31, 13:17:19 Ymblanter 2017-11-02, 16:53:03
AndreCostaWMSE-bot_6 2017-10-30, 16:33:16 Ymblanter 2017-11-02, 16:49:44
AndreCostaWMSE-bot 5 2017-10-30, 15:41:02 Ymblanter 2017-11-02, 16:47:37
Position_holder_history_bot 2017-10-27, 07:17:46 Ymblanter 2017-11-05, 04:00:57
neonionbot 2017-10-19, 06:15:18 ArthurPSmith 2017-10-19, 13:12:49
Handelsregister 2017-10-16, 07:39:42 ChristianKl 2017-11-20, 09:58:25
TheStoneBot 2017-10-13, 12:57:52 Ymblanter 2017-10-16, 20:20:10
Peuc_bot 2 2017-10-12, 15:08:26 Peuc 2017-10-16, 02:48:24
JoRobot 2 2017-10-06, 20:18:08 Lymantria 2017-10-14, 09:41:53
Peuc_bot 2017-10-05, 06:53:26 Ymblanter 2017-10-09, 20:56:12
PositionStatements_Bot 2017-09-21, 12:00:31 Ymblanter 2017-09-27, 19:52:18
Alexabot 2017-09-24, 15:20:49 Tozibb 2017-11-09, 22:59:00
FutoohBot 2017-09-21, 14:27:17 Lymantria 2017-09-24, 12:50:24
CoBot 2017-09-15, 04:24:56 Lymantria 2017-09-27, 05:30:22
AftabBot 2017-09-07, 17:48:42 Ymblanter 2017-09-15, 19:32:16
AliciaFagervingWMSE-bot 8 2017-09-07, 13:15:23 Ymblanter 2017-11-02, 17:10:51
Sartle.wiki.bot 2017-09-06, 20:07:45 Ymblanter 2017-09-16, 20:58:14
mastiBot 1 2017-09-04, 17:46:05 GZWDer 2017-09-04, 17:46:05
AndreCostaWMSE-bot 4 2017-08-28, 14:27:52 Ymblanter 2017-08-29, 13:36:04
VortBot 2017-08-26, 17:20:22 Ymblanter 2017-08-31, 16:35:31
CzechoBot 2017-08-25, 18:19:46 Ymblanter 2017-09-08, 05:54:39
ScorumMEBot 2017-08-23, 14:21:43 Ymblanter 2017-09-04, 07:36:04
JudgeBot 2017-08-22, 15:26:20 Lymantria 2017-08-27, 07:06:02
Prompter Bot 2017-08-21, 11:44:44 Ymblanter 2017-08-26, 20:28:07
BandMemberBot 2017-08-11, 22:00:11 Lymantria 2017-08-27, 07:08:46
AliciaFagervingWMSE-bot 7 2017-08-10, 19:03:28 Ymblanter 2017-08-21, 14:36:55
Bekicot 2 2017-08-09, 03:39:17 Ymblanter 2017-08-19, 04:03:41
Bekicot 2017-08-02, 12:52:26 Yana agun 2017-08-16, 11:06:10
JarBot 4 2017-08-01, 17:18:43 Ymblanter 2017-08-15, 02:15:07
SalviBot 2017-07-27, 18:20:50 Ymblanter 2017-08-05, 12:22:29
Valerio Bozzolan bot 2017-07-25, 19:01:02 Lymantria 2017-07-30, 17:54:40
InwBot 2 2017-07-20, 21:21:48 Ymblanter 2017-07-28, 19:46:53
PokestarFanBot 5 2017-07-21, 02:55:55 Lymantria 2017-07-22, 06:08:32
Framabot 2 2017-07-19, 22:32:29 Lymantria 2017-07-21, 10:53:45
HiveBot 2017-07-16, 18:40:47 Lymantria 2017-09-16, 20:38:00
DanmicholoBot 7 2017-07-14, 15:27:34 Lymantria 2017-07-21, 10:52:51
Emijrpbot 9 2017-07-08, 10:44:54 Ymblanter 2017-07-12, 18:45:26
Jntent's Bot 1 2017-06-30, 23:37:11 XXN 2017-07-17, 14:10:43
JarBot 3 2017-06-24, 21:57:24 Ymblanter 2017-07-12, 18:46:26
WikiCompBot 2017-03-09, 05:43:59 Pasleim 2017-07-11, 08:07:29
WikiProjectFranceBot 2017-05-08, 20:01:48 Lymantria 2017-06-25, 20:40:52
legislator info 2017-05-13, 19:44:16 Pasleim 2017-07-11, 08:11:06
Jefft0Bot 2017-04-17, 15:16:29 Ymblanter 2017-07-28, 19:48:26
hz.cmu.bot 2017-03-14, 16:17:51 XXN 2017-06-30, 20:45:58
MsynBot 1 2017-06-25, 17:11:09 GZWDer 2017-06-25, 17:11:09
MatSuBot 8 2017-06-15, 16:40:47 Lymantria 2017-07-22, 06:05:25
PoliticianBot 1 2017-06-25, 17:11:07 GZWDer 2017-06-25, 17:11:07
MexBot 2 2017-06-08, 03:00:53 ValterVB 2017-06-25, 14:32:26
Emijrpbot 8 2017-03-25, 11:42:28 Matěj Suchánek 2017-06-09, 06:47:22
ZacheBot 2017-03-04, 23:29:38 Zache 2017-07-11, 11:13:15
НСБот 2017-02-24, 12:12:11 Ymblanter 2017-03-03, 08:48:41
MechQuesterBot 2 2017-02-26, 22:31:51 Ymblanter 2017-07-06, 19:18:47
YULbot 2017-02-21, 18:05:13 ChristianKl 2017-06-25, 09:35:54
JayWackerBot 2017-02-09, 17:26:47 JayWacker 2017-03-01, 18:18:50
YBot 2017-01-12, 16:43:19 Pasleim 2017-01-15, 19:26:39
EaasServiceBot 2017-01-10, 15:09:13 Ymblanter 2017-03-05, 00:00:16
DiscogsBot 2016-12-12, 11:32:55 Pasleim 2017-01-15, 19:52:08
DoctorBot 2016-11-27, 03:01:24 DoctorBud 2016-12-21, 00:51:03
WikiLovesESBot 2016-07-03, 10:25:13 Jura1 2016-08-26, 08:42:53
MatSuBot 6 2016-07-01, 19:12:23 Matěj Suchánek 2017-06-20, 18:49:13
1-Byte-Bot 2016-03-02, 15:23:09 Mbch331 2017-08-25, 20:57:35
Hkn-bot 2016-01-16, 18:52:00 Mbch331 2017-09-16, 05:53:22
Dexbot 11 2015-04-07, 18:15:00 Ladsgroup 2017-05-12, 14:56:50
KunMilanoRobot 2014-01-21, 19:27:44 Alphama 2016-06-21, 18:15:06
AviBot 2016-05-17, 21:29:54 Mbch331 2017-08-25, 20:37:22
InteliBOT 2015-01-16, 20:14:11 Ymblanter 2017-08-27, 20:17:40
Mahirbot 2016-02-25, 04:22:59 Mbch331 2017-08-25, 20:34:41
SaamDataImportBot 2016-04-20, 18:45:43 Mbch331 2017-08-25, 20:30:48
welvon-bot 2017-02-12, 19:47:23 Vogone 2017-02-12, 19:47:23

Botcrux[edit]

Botcrux (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Horcrux92 (talkcontribslogs)

Task/s: Importing BabelNet id (P2581) from babelnet.org. I've seen we have only a handful of usages of this property.

Function details: The job is pretty simple using BabelNet APIs(permalink) for converting a Wikidata id to a synset id. --Horcrux92 (talk) 10:38, 22 November 2017 (UTC)

KabouterBot[edit]

KabouterBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: K175 (talkcontribslogs)

Task/s: Linking birth year and death year categories on af: to corresponding Wikidata items, e.g. af:Kategorie:Geboortes in 1959/en:Category:1959 births (Q6647554).

Code: [1]

Function details: Systematically run through all birth year and death year categories on af:, check if already linked to corresponding Wikidata item (search by English title), if not, determine Wikidata item and establish link. --K175 (talk) 22:48, 17 November 2017 (UTC)

Test run has been completed successfully. Edit summary has been updated to be more descriptive of task. CC @User:Ymblanter K175 (talk) 13:19, 22 November 2017 (UTC)

CyclingInitBot[edit]

CyclingInitBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
(I copy here as it doesn't appear no idea why)

Operator: Psemdel (talkcontribslogs) Script to initiate Wikidata items linked to cycling.

Task/s:

  • (Ready) Init pages of type "Country" women's national road cycling team "Year". It means:
    • Create the item (if not existing)
    • Fill the label, description, alias and some properties (Country for instance).
    • Link the item to the item of the year before, after and the page "Country" women's national road cycling team.

For an example see Q26215293.

  • (Close future) Same idea but with national championships.
  • Init stage races. So stage 1, stage 2 and link them.

Code: See user page.

Function details: Basically there is a table with all countries name and associated code. There is a function to search item, another to add value in the properties. The linking concept is in the main.

Questions: I am beginner. I don't know how to have a botflag. --Psemdel (talk) 18:34, 12 November 2017 (UTC)

NIOSH bot[edit]

NIOSH bot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Harej (talkcontribslogs)

Task/s: Synchronize Wikidata with the NIOSHTIC-2 research database.

Code: https://github.com/harej/niosh2wikidata

Function details: NIOSHTIC-2 is a database of occupational safety and health research published by NIOSH and/or supported by NIOSH staff. As part of my work with NIOSH I have developed scripts to make sure NIOSHTIC has corresponding entries in Wikidata (but, where possible, it will not create duplicates of entries that already exist on Wikidata). This allows NIOSH's data to be part of a greater network of data, for instance by including data from other sources such as PubMed. Better indexing this data is part of a longer-term effort to make it easier for Wikipedia editors to discover these reliable resources. --Harej (talk) 05:59, 14 November 2017 (UTC)

Please make some test edits.--Ymblanter (talk) 11:51, 19 November 2017 (UTC)

Citationgraph bot[edit]

Citationgraph bot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Harej (talkcontribslogs)

Task/s: Add citation relationships between papers on Wikidata.

Code: pmcid_to_cites.py, citation_grapher.py, edit_queue.py.

Function details: There are many Wikidata items about journal articles. This bot connects them to each other via cites (P2860). Identifying these relationships helps identify the provenance of information from original clinical research to review. This also helps identify influential papers in a given field. The bot will run on a daily basis. --Harej (talk) 05:39, 14 November 2017 (UTC)

Please make some test edits.--Ymblanter (talk) 11:51, 19 November 2017 (UTC)
Hi @Harej: - what's your data source for this? It sounds like a great idea but provenance for this info may be important. ArthurPSmith (talk) 00:59, 20 November 2017 (UTC)

AliciaFagervingWMSE-bot 15[edit]

AliciaFagervingWMSE-bot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Alicia Fagerving (WMSE) (talkcontribslogs)

Task/s: The function of the bot is to import data about immovable cultural heritage to Wikidata as part of Wikimedia Sverige's Connected Open Heritage Project.

This request is for data about cultural heritage monuments in Serbia from the Wiki Loves Monuments Database.

Code: The bot uses Python and the Pywikibot framework. The code is up on Github: Framework, Specific table processing script


Function details:

The bot processes data from the Wiki Loves Monuments Database, in this case rs, about 2,300 items. They will all make use of cultural heritage monument in Serbia ID (P4245) as an identifier.

Below you can see:

Some test edits have been made: no label (Q16085401), no label (Q3280273), no label (Q20434834), no label (Q42841509), no label (Q42841499).

Ping @André Costa (WMSE): -- Alicia Fagerving (WMSE) (talk) 07:38, 10 November 2017 (UTC)

  • Language code for labels should probably be "sr-ec" instead of just "sr" [2]. Please add more specific P31 than "Q2065736"
    --- Jura 07:56, 10 November 2017 (UTC)
    • @Jura1: There is no information in the tables which would allow for the determination of a more specific instance of (P31). Note that cultural property (Q2065736) is only used in items that don't already have a instance of (P31) value. /André Costa (WMSE) (talk) 12:09, 10 November 2017 (UTC)
      • An automated translation of the label gave me "archeological site". Q839954 would be sufficient.
        --- Jura 12:30, 10 November 2017 (UTC)
        @André Costa (WMSE):@Alicia Fagerving (WMSE): Any reaction to this?--Ymblanter (talk) 11:50, 19 November 2017 (UTC)
        • @Jura1:. Sorry missed your reply. Relying on free text matching of the labels for P31 is something we tried before with quite bad results (even with the support of someone speaking the language). The problem was too many false positives (e.g. labels may include names of people or places which, like "Newcastle", look like a type description) which were hard to detect and fix afterwards. When there had been a "type" column in the tables we have done such making with much more successful results. /André (logged out for holidays) 19:05, 19 November 2017 (UTC)

neonionbot[edit]

neonionbot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Jkatzwinkel (talkcontribslogs)

Task/s: Map semantic annotations made with annotation software neonion to wikidata statements in order to submit either bibliographical evidence, additional predicates or new entities to wikidata. Annotation software neonion is used for collaborative semantic annotating of academic publications. If a text resource being annotated is an open access publication and linked to a wikidata item page holding bibliographical metadata about the corresponding open access publication, verifiable contributions can be made to wikidata by one of the following:

  1. For a semantic annotation, identify an equivalent wikidata statement and provide bibliographical reference for that statement, linking to the item page representing the publication in which the semantic annotation has been created.
  2. If a semantic annotation provides new information about an entity represented by an existing wikidata item page, create a new statement for that item page containing the predicate introduced by the semantic annotation. Attach bibliographic evidence to the new statement analogously to scenario #1.
  3. If a semantic annotation represents a fact about an entity not yet represented by a wikidata item page, create an item page and populate it with at least a label and a P31 statement in order to meet the requirements for scenario #2. Provide bibliographical evidence as in scenario #1.


Code: Implementation of this feature will be published on my neonion fork on github.

Function details: Prerequisite: Map model of neonion's controlled vocabulary to terminological knowledge extracted from wikidata. Analysis of wikidata instance/class relationships ensures that concepts of controlled vocabulary can be mapped to item pages representing wikidata classes.

Task 1: Identify item pages and possibly statements on wikidata that are equivalent to the information contained in semantic annotations made in neonion.

Task 2: Based on the results of task 1, determine if it is appropriate to create additional content on wikidata in form of new statements or new item pages. For the statements at hand, provide an additional reference representing bibliographical evidence referring to the wikidata item page representing the open access publication in which neonion created the semantic annotation.

What data will be added? Proposed scenario is meant to be tried first on articles published in scientific open-access journal Apparatus. --Jkatzwinkel (talk) 06:15, 19 October 2017 (UTC)

I find this proposal very hard to understand without seeing an example - can you run one or mock one (or several) up using the neonionbot account so we can see what it would likely do? ArthurPSmith (talk) 13:12, 19 October 2017 (UTC)

Handelsregister[edit]

Handelsregister (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: SebastianHellmann (talkcontribslogs)

Task/s: Crawl https://www.handelsregister.de/rp_web/mask.do and then go to UT (Unternehmenstraeger) and add an entry for each German organisation with the basic info, especially registering court and assigned id by court into Wikidata.

Code: The code is a fork of https://github.com/pudo-attic/handelsregister (small changes only)

Function details:

Task 1, prerequisite for Task 2 Find all current organisations in Wikidata that are registered in Germany and find the correlating Handelsregister entry. Then add the data for the respective Wikidata items.

What data will be added? The Handelsregister collects information from all German courts, where all organisations in Germany are obliged to register. The data is given from the courts to a private company running the handelsregister, who makes part of the information public (i.e. UT - Unternehmenstraegerdaten, core data) and sells the other part. Each organisation can be uniquely identified by the registering court and the number assigned by this court (only the number is not enough, as two courts might assign the same number). Here is an example of the data:

  • Saxony District court Leipzig HRB 32853 – A&A Dienstleistungsgesellschaft mbH
  • Legal status: Gesellschaft mit beschränkter Haftung
  • Capital: 25.000,00 EUR
  • Date of entry: 29/08/2016
  • (When entering date of entry, wrong data input can occur due to system failures!)
  • Date of removal: -
  • Balance sheet available: -
  • Address (subject to correction): A&A Dienstleistungsgesellschaft mbH
  • Prager Straße 38-40
  • 04317 Leipzig

Most items are stable, i.e. each org is registered, when it is founded and assigned a number by the court: Saxony District court Leipzig HRB 32853 . Then only the address and the status can change. For Wikidata, it is no problem keeping companies that are not existing any more as they should be conserved for historical purposes.

Maintenance should be simple: Once a Wikidata item contains the correct court and the number, the entry can be matched 100% to the entry in Handelsregister. This way Handelsregister can be queried once or twice a year to update the info in Wikidata.

Question 1: bot or other tool How data is added? I am keeping the bot request, but I will look at Mix and Match first. Maybe this tool is better suited for task 1.

Question 2: modeling Which properties should be used in Wikidata? I am particular looking for the property for the court as registering organisation, i.e. that has the authority to define the identity of an org. and then also the number (HRB 32853). The types, i.e. legal status can be matched to existing Wikidata entries. Most exist in the German Wikipedia. Any help for the other properties is appreciated.

Question 3: legal I still need to read up on the right situation for importing crawled data. Here is a hint given on the mailing list:

https://en.wikipedia.org/wiki/Sui_generis_database_rights You'd need to check whether in Germany it applies to official acts and registers too... https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights

Task 2 Add all missing identifiers for the remaining orgs in Handelsregister. Whereas 2 can be rediscussed and decided, if 1 is finished sufficiently.

It should meet notability criteria 2: https://www.wikidata.org/wiki/Wikidata:Notability

  • 2. It refers to an instance of a clearly identifiable conceptual or material entity. The entity must be notable, in the sense that it can be described using serious and publicly available references. If there is no item about you yet, you are probably not notable.

The reference is the official German business registry, which is serious and public. Orgs are also per definition clearly identifiable legal entities.

--SebastianHellmann (talk) 07:39, 16 October 2017 (UTC)

Could you make a few example entries to illustrate how the items you want to create will look like? What strategy will you use to avoid creating doublicate items? ChristianKl (talk) 12:38, 16 October 2017 (UTC)
I think this is a good idea, but I agree there needs to be a clear approach to avoiding creating duplicates - we have hundreds of thousands of organizations in wikidata now, many of them businesses, many from Germany, so there certainly should be some overlap. Also I'd like to hear how the proposer plans to keep this information up to date in future. ArthurPSmith (talk) 15:13, 16 October 2017 (UTC)
There was a discussion on the mailing list. It would be easier to complete the info for existing entries in Wikidata at first. I will check mix and match for this or other methods. Once this space is clean, we can rediscuss creating new identifiers. SebastianHellmann (talk) 16:01, 16 October 2017 (UTC)
Is there an existing ID that you plan to use for authority control? Otherwise, do we need a new property? ChristianKl (talk) 20:40, 16 October 2017 (UTC)
  • Given that this data is fairly frequently updated, how is it planned to maintain it?
    --- Jura 16:38, 16 October 2017 (UTC)
* The frequency of updates is indeed large: A search for deletion announcements alone in the limited timeframe of 1.9.-15.10.17 finds 6682 deletion announcements (which legally is the most seriouss change and makes approx. 10% of all announcements). So within one year, more than 50,000 companies are deleted - which for sure should be reflected in according Wikidata entries. Jneubert (talk) 15:44, 17 October 2017 (UTC)
Hi all, I updated the bot description, trying to answer all questions from the mailing list and here. I still have three questions, which I am investigating. Help and pointers highly appreciated. SebastianHellmann (talk) 23:36, 16 October 2017 (UTC)
  • Given that German is the default language in Germany I would prefer the entry to be "Sachsen Amtsgericht Leipzig HRB 32853" instead of "Saxony District court Leipzig HRB 32853". Afterwards we can store that as an external ID and make a new property for that (which would need a property proposal). ChristianKl (talk) 12:33, 17 October 2017 (UTC)
Thanks for the updated details here. It sounds like a new identifier property may be needed (unless one of the existing ones like Legal Entity ID (P1278) suffices, but I suspect most of the organizations in this list do not have LEI's (yet?)). Ideally an identifier property has some way to turn the identifiers into a URL link with further information on that particular identified entity, that de-referenceability makes it easy to verify - see "formatter URL" examples on some existing identifier properties. Does such a thing exist for the Handelsregister? ArthurPSmith (talk) 14:58, 17 October 2017 (UTC)

Kopiersperre Jklamo ArthurPSmith S.K. Givegivetake fnielsen rjlabs ChristianKl Vladimir Alexiev User:Pintoch Parikan User:Cardinha00 User:zuphilip


Pictogram voting comment.svg Notified participants of WikiProject Companies for input.

@SebastianHellmann: for task 1, you might also be interested in OpenRefine (make sure you use the German reconciliation interface to get better results). See https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation for details of its reconciliation features. I suspect your dataset might be a bit big though: I think it would be worth trying only on a subset (for instance, filter out those with a low capital). − Pintoch (talk) 14:52, 20 October 2017 (UTC)

Concerning Task 2, I'm a bit worried about the companies' notability (ot lack thereof), since the Handelsregister includes any and all companies. Not just the big ones where there's a good chance that Wikipedia articles, other sources, external IDs, etc exist. But also tiny companies and even one-person-companies, like someone selling stuff on Ebay or some guy selling christmas trees in his village. So it would be very hard to find any data on these companies outside the Handelsregister and the phonebook. --Kam Solusar (talk) 05:35, 21 October 2017 (UTC)

Agreed. Do we really need to be a complete copy of the Handelsregister? What for? How about concentrating on a meaningful subset instead that addresses a clear usecase? --LydiaPintscher (talk) 10:35, 21 October 2017 (UTC)
That of course is true. A strict reading of Wikidata:Notability could be seen as that at least two reliable sources are required. But then, that could be the phone book. Do we have to make those criteria more strict? That would require a RfC. Lymantria (talk) 07:58, 1 November 2017 (UTC)
I would at least try an RfC, but I am not immediately sure what to propose.--Ymblanter (talk) 08:05, 1 November 2017 (UTC)
If there's an RfC I would say that it should say that for data-imports of >1000 items the decision whether or not we import the data should be done via a request for bot permissions. ChristianKl (talk) 12:35, 4 November 2017 (UTC)
@SebastianHellmann: is well-intended, but I agree not all companies are notable. Even worse than 1-man shops are inactive companies that nobody bothered to close yet. Just "comes from reputable source" is not enough: eg OpenStreetMaps is reputable, and it would be ok to import all power-stations (eg see Enipedia) but imho not ok to import all recyclable garbage cans. We got 950k BG companies at http://businessgraph.ontotext.com/ but we are hesitant to dump them on Wikidata. Unfortunately official trade registers usually lack measures of size or importance...
It's true the Project Companies has not gelled yet and there's no clear Community of Use for this data. On the other hand, if we don't start somewhere and experiment, we may never get big quantities of company data. So I'd agree to this German data dump by way of experiment --Vladimir Alexiev (talk) 15:46, 19 November 2017 (UTC)

Kopiersperre Jklamo ArthurPSmith S.K. Givegivetake fnielsen rjlabs ChristianKl Vladimir Alexiev User:Pintoch Parikan User:Cardinha00 User:zuphilip


Pictogram voting comment.svg Notified participants of WikiProject Companies Pictogram voting comment.svg Comment As best I know Project Companies has yet to gel up workable (for the immediate term) notability standard so the area remains fuzzy. Here is my current thinking [[3]] Very much like the above automation of updates. Hopefully the fetching scripts for Germany can be generalizeable to work in most developed countries that publish structured data on public companies. Would love to find WikiData consensus on Notability vs. its IT capacity and stomach for volumes of basically table data. Rjlabs (talk) 16:47, 3 November 2017 (UTC)

  • @Rjlabs: That hope is not founded because each jurisdiction does its own thing. OpenCorporates has a bunch of web crawling scripts (some of them donated) that they consider a significant IP. And as @SebastianHellmann: wrote their data is sorta open but not really. --Vladimir Alexiev (talk) 15:46, 19 November 2017 (UTC)
  • I Symbol support vote.svg Support importing the data. Having the data makes it easier to enter the employer when we create items for new people. Companies also engage into other actions that leave marks in databases such as registering patents or trademarks and it makes it easier to import such data when we already have items for the companies. The ability to run queries about the companies that are located in a given area is useful. ChristianKl (talk) 17:20, 3 November 2017 (UTC)
    • @ChristianKl: at least half of the 200M or so companies world-wide will never have notable employees nor patents, so "let's import them just in case" is not a good policy --Vladimir Alexiev (talk) 15:46, 19 November 2017 (UTC)
  • @Rjlabs: We did go back and forth with a lot of ideas on how to set some sort of criteria for company notability. I think any public company with a stock market listing should be considered notable, as there's a lot of public data available on those. For private companies we talked about some kind of size cutoff, but I suppose the existence of 2 or more independent reference sources with information about the company might be enough? ArthurPSmith (talk) 18:01, 3 November 2017 (UTC)
  • @ArthurPSmith:@Denny:@LydiaPintscher: Arthur, let's make it any public company that trades on a recognized stock exchange, anywhere worldwide, with a continuous bid and ask quote, that actually trades at least once per week is automatically considered "notable" for WikiData inclusion. This is by virtue that real people wrote real checks to buy shares and there is sufficient continuing trading interest in the stock to make it trade at least once per week, and some exchange somewhere endows that firm to be listed on its exchange. We should also note that passing this hurdle means that SOME data on that firm is automatically allowable on WikiData, provided the data is regularly updated. Rjlabs (talk) 19:35, 3 November 2017 (UTC)
    • @Rjlabs, Denny, LydiaPintscher: Public Companies are a no-brainer because there's only 60k in the world (there are about 2.6k exchanges); compare to about 200M companies world-wide. --Vladimir Alexiev (talk) 15:46, 19 November 2017 (UTC)
  • Some data means (for right now) information like LEI, name, address, phone, industry code(s), brief text description of what they do) plus about 10 high level fields that cover the most frequently needed company data (such as: sales, employees, assets, principal exchange(s) down to where at least 20% of the volume is traded, unique symbol on that exchange, CEO, URL to investor relations section of website where detailed financial statements may be found, Central index key (or equivalent) with link to regulatory filings / structured data in the primary country where its regulated. For now that is all that should be "automatically allowable". No detailed financial statements, line by line, going back 10-20 years, with adjustments for stock splits, etc. No bid/offer/last trade time series. Consensus on further detail has to wait further gelling up. I Ping Lydia and Denny here to be sure they are good with this potential volume of linked data. (I think it would be great, a good start and limited. I especially like it if it MANDATES LEI, if one is available). Moving down from here (after 100% of public companies that are alive enough to actually trade) there is of course much more. However its a very murky area. >=2 independent reference sources with information about the company might be too broad causing WikiData capacity issues, or it may be too burdensome if someone has a structured data source that is much more reliable then WikiData to feed in, but lacks that "second source". Even if was one absolutely assured good quality source, and WikiData capacity was not an issue, I'd like to see a "sustainability" requirement up front. Load no private company data where it isn't AUTOMATICALLY updated or expired out. Again, would be great to have further Denny/Lydia input here on any capacity concern. Rjlabs (talk) 19:35, 3 November 2017 (UTC)
    • "A modicum of data" as you describe above is a good criterion for any company. --Vladimir Alexiev (talk)
    • On WikidataCon there was a question from the audience of whether Wikidata would be okay with importing the 400 million entries about items in museums that are currently managed by various museums. User:LydiaPintscher answered by saying that her main concerns aren't technical but whether our communities does well with handling a huge influx of items. Importing data like the Handelsregister will mean that there will be a lot of items that won't be touched by humans but I don't think that's a major concern for our community. Having more data means more work for our community but it also means that new people get interested in interacting with Wikidata. When we make decisions like this, technical capabilities however matter. I think it would be great if a member of the development team would write a longer blog post that explains the technical capabilities, so that we can better factor them into our policy decisions. ChristianKl (talk) 12:35, 4 November 2017 (UTC)
I agree with Lydia. The issue is hardly the scalability of the software - the software is designed in such a way that there *should* not be problems with 400M new items. The question is do we have a story as a community to ensure that these items don't just turn into dead weight. Do we ensure that items in this set are reconciled with existing items if they should be? That we can deal with attacks on that dataset in some way, with targeted vandalism? Whether the software can scale, I am rather convinced. Whether the community can scale, I think we need to learn that.
Also, for the software, I would suggest not to grow 10x at once, but rather to increase the total size of the database with a bit more measure, and never to more than double it in one go. But this is just, basically, for stress-testing it, and to discover, if possible, early unexpected issues. But the architecture itself should accommodate such sizes without much ado (again - "should" - if we really go for 10x, I expect at least one unexpected bug to show up). --Denny (talk) 23:25, 5 November 2017 (UTC)
Speaking of the community being able to handle dead weight, it seems we mostly lack the tools to do so. Currently we are somewhat flooded by items from cebwiki and despite efforts by individual users to deal with one or the other problem, we still haven't tackled them systematically and this lead to countless items with unclear scope complicating every other import.
--- Jura 07:00, 6 November 2017 (UTC)
I don't think we should just add 400M new items in one go either. I don't think that the amount of vandalism that Wikidata faces scales directly with the amount of items that we host if we double the amount of items we don't double the amount of vandalism.
As far as the cebwiki items go, the problem isn't just that there are many items. The problem is that there's unclear scope for a lot of the items. For me that means that when we allow massive data imports we have to make sure that the imported data is up to a high quality where the scope of every item is clear. This means that having a bot approval process for such data imports is important and suggests to me that we should also get clear about the necessarity of having a bot approval for creating a lot of items via QuickStatements.
Currently, we are importing a lot of items via WikiCite and it seems to me that process is working without significant issues.
I agree that scaling the community should be a higher priority than scaling the number of items. One implication of that is that it makes sense to have higher standards for mass imports via bots than for items added by individuals (a newbie is more likely to become involved in our community when we don't great him by deleting the items they created).
Another implication is that the metric we celebrate shouldn't be focused on the number of items or statments/item but the number of active editors. ChristianKl () 09:58, 20 November 2017 (UTC)

The check just ended with 3,602 maching works - less than expected. Import will start very soon. Peuc (talk) 02:45, 16 October 2017 (UTC)


Alexabot[edit]

Alexabot (talkcontribsnew itemsSULBlock logUser rights logUser rights) Operator: Tozibb (talkcontribslogs)

Task/s:


Code:
Currently the code is only available on my computer - once the bot gets approved for testing I will upload it to a github or something similar


Function details:

  • searches en.wikipedia.org for pages with Alexa rankings
  • obtains their corresponding Wikidata item (if existent)
  • obtains the official website (P856) of the Wikidata item
  • queries Alexa API to obtain the rank of the website
  • inserts or updates property Alexa rank (P1661) accordingly
  • requested and inspired by Jc86035 (talkcontribslogs)

--Tozibb (talk) 15:39, 24 September 2017 (UTC)

Tozibb, thanks for working on this. Will this be for English Wikipedia only or will it also get URLs from other WMF wikis? Jc86035 (talk) 15:43, 24 September 2017 (UTC)
Jc86035 , so far only the English wikipedia is queried but I can query other wikipedias as well. What would you suggest? Maybe the ten largest wikipedias? --Tozibb (talk) 16:55, 24 September 2017 (UTC)
@Tozibb: I think the ten largest should be fine for now (maybe expand to twenty later). Jc86035 (talk) 16:58, 24 September 2017 (UTC)
Please perform some (50-100) test edits and create the user page for your bot (with e.g. {{Bot}}). Lymantria (talk) 07:37, 26 September 2017 (UTC)
pinging Tozibb Jc86035 (talk) 03:29, 1 October 2017 (UTC)
I created the bot account page Alexabot and ran five test edits. However I run into a permission error
pywikibot.data.api.APIError: failed-save: The save has failed. [help:See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list a
t <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes.; messages:[{'parameters': [], 'html': {'*': 'T
he save has failed.'}, 'name': 'wikibase-api-failed-save'}, {'parameters': ['editsemiprotected', 'edit'], 'html': {'*': '<table class="plainlinks ombox ombox-protection" role="p
resentation"><tr><td class="mbox-image"><a href="/wiki/File:Create_protect.svg" class="image"><img alt="Create protect.svg" src="//upload.wikimedia.org/wikipedia/commons/thumb/9
/9b/Create_protect.svg/40px-Create_protect.svg.png" width="40" height="40" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/9/9b/Create_protect.svg/60px-Create_protect.svg
.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/9/9b/Create_protect.svg/80px-Create_protect.svg.png 2x" data-file-width="20" data-file-height="20" /></a></td><td class
="mbox-text" style="font-weight:bold; font-size:130%;">This page is currently <a href="/wiki/Wikidata:Protection_policy" class="mw-redirect" title="Wikidata:Protection policy">c
reate-protected</a> and can be created only by <a href="/wiki/Wikidata:Autoconfirmed_users" title="Wikidata:Autoconfirmed users">established registered users</a>.</td></tr></tab
le>\n<table width="100%">\n\n<tr>\n<th> Why is this page protected? </th>\n<th> What can I do?\n</th></tr>\n<tr>\n<td><li>Some templates are indefinitely protected because their
 widespread use means that a single change could disrupt the appearance of many pages.</td>\n<td><li>If you do not have an account, you may <a href="/wiki/Special:CreateAccount"
 title="Special:CreateAccount">create one</a>. After 50 edits and 4 days, you will be able to edit semi-protected pages. Accounts that don\'t meet these requirements may bypass
them with the <a href="/wiki/Wikidata:Confirmed_users" title="Wikidata:Confirmed users">confirmed user</a> right, which may be requested at <a href="/wiki/Wikidata:Requests_for_
permissions#Requests_for_other_rights" title="Wikidata:Requests for permissions">Wikidata:Requests for permissions#Requests for other rights</a>.\n</td></tr>\n<tr>\n<td><li>Alth
ough most items, properties, and pages on Wikidata are free to be edited by anyone, semi-protection is often used to stop persistent <a href="/wiki/Wikidata:Vandalism" title="Wi
kidata:Vandalism">vandalism</a>.</td>\n<td>\n</td></tr>\n<tr>\n<td><li>The latest <span class="plainlinks"><a class="external text" href="//www.wikidata.org/w/index.php?title=Sp
ecial:Log&amp;type=protect&amp;page=Special:Badtitle/dummy_title_for_API_calls_set_in_api.php">protection log</a></span> entry contains the reason left by the protecting adminis
trator for this protection. If there is no relevant log entry, it may be because this page has been moved after being protected.</td>\n<td><li>You may ask for unprotection at <a
 href="/wiki/Wikidata:Administrators%27_noticeboard" title="Wikidata:Administrators\' noticeboard">Wikidata:Administrators\' noticeboard</a>.\n</td></tr>\n</table>\n'}, 'name':
'protectedpagetext'}, {'parameters': [], 'html': {'*': '⧼no-permission⧽'}, 'name': 'no-permission'}]]

Lymantria, do you know how what is the reason for this error and/or how to resolve it? Thanks for your input.

Pinging Lymantria. Tozibb, this is because your bot was not yet autoconfirmed (4 days, 50 edits), although it now is. I don't know which page it was editing, but autoconfirmed is usually the highest level of protection applied on items, so it should be a non-issue now. Jc86035 (talk) 13:10, 2 October 2017 (UTC)
If the problem was being autoconfirmed, it should be solved now. I granted the "confirmed" flag. Lymantria (talk) 13:15, 2 October 2017 (UTC)
The problem is resolved. Thanks Lymantria --Tozibb (talk) 16:24, 8 October 2017 (UTC)
Tozibb, please go over the data you added and add the qualifier point in time (P585) 1 October 2017 and source reference URL (P854) https://www.alexa.com/siteinfo/$1. You can probably do this by downloading a property–value table for the items your bot edited, adding the qualifier/reference, and sticking it into QuickStatements (although maybe you'd prefer to do it using pywiki). Also, you should remove the duplicate values that you initially added. Jc86035 (talk) 13:22, 2 October 2017 (UTC)

Lymantria Jc86035,

I continued working on the bot which now adds point in time (P585) and reference URL (P854) along with archive URL (P1065) as sources for Alexa rank (P1661). Furthermore it is now possible to set a minimum period of days between before a new alexa rank claim is added. The latest Alexa rank (P1661) is set to "preferred" value. My test run includes Alexa rank (P1661) updates for ten items. Could you please check/verify its work? If all is fine, I would like to increase the number of items to be edited. Furthermore I would like your opinion on the timespan between two updates of Alexa rank (P1661). I don't want the to "spam" WikiData by i.e. by letting it run every 5 days or so. I think of every 2 or 4 weeks should be good to capture the alexa ranking - this seems to be a good tradeoff for me.

Thanks for your help on this. --Tozibb (talk) 15:12, 8 November 2017 (UTC)

@Tozibb: It's working fine, except point in time (P585) should be added as a qualifier and not as part of the reference. Is the archival of the Alexa ranking page updated automatically as well? I think a time period of four weeks should work. Jc86035 (talk) 10:27, 9 November 2017 (UTC)
@Jc86035:. Thanks for your feedback. I added point in time (P585) as qualifier (currently testing on Wikidata Sandbox (Q4115189)). I trigger the archivation of the alexa ranking page and use a timestamp(now) to create archive URL (P1065). This way archive.org returns the snapshot of the requested page closest to the specified timestamp in the URL. I will update 5 more items with the new alexa ranking soon to continue testing.

If no one opposes then I will configure the bot to update every 4weeks/28days. I am also thinking of cleanining up/deleting Alexa rank (P1661) with no point in time (P585) as qualifier, or is there a good reason to keep ranking values with no point in time (P585)? Thanks for you input. --Tozibb (talk) 22:58, 9 November 2017 (UTC)

Jntent's Bot 1[edit]

Jntent's Bot (talkcontribsnew itemsSULBlock logUser rights logUser rights) Operator: Jntent (talkcontribslogs)

Task/s:

The task is to add assertions about airports from template pages.

Code:

The code is based on pywikibot's harvest_templates.py  under scripts in https://github.com/wikimedia/pywikibot-core

Function details:


I added some constraints for literal values with regular expressions to parse "Infobox Airport" and similar ones in other languages. See the

I hope to scrape the airport templates from a few languages. Since the "Infobox Airport" template contains a links to pages about airport codes,

{{Infobox airport
| name         = Denver International Airport
| image        = Denver International Airport Logo.svg
| image-width  = 250
| image2       = DIA Airport Roof.jpg
| image2-width = 250
| IATA         = DEN
| ICAO         = KDEN
| FAA          = DEN
| WMO          = 72565
| type         = Public
| owner        = City & County of Denver Department of Aviation
| operator     = City & County of Denver Department of Aviation
| city-served  = [[Denver]], the [[Front Range Urban Corridor]], Eastern Colorado, Southeastern Wyoming, and the [[Nebraska Panhandle]]
| location     = Northeastern [[Denver]], [[Colorado]], U.S.
| hub          =
...
}}

I will use links to pages about airport codes to find airports. One example is:

https://en.wikipedia.org/wiki/International_Air_Transport_Association_airport_code

Template element Property Constraining regex (from properties)
IATA Property:P238 [A-Z]{3}
ICAO Property:P239 ([A-Z]{2}|[CKY][A-Z0-9])[A-Z0-9]{2}
FAA Property:P240 [A-Z0-9]{3,4}
coordinates Property:P625 6 numbers and 2 cardinalities surrounded by "|" from the coord template:
{{coord|39|51|42|N|104|40|23|W|region:US-CO|display=inline,title}}
city-served Property:P931 The first valid link, standard harvest_template.py behavior

 – The preceding unsigned comment was added by Jntent (talk • contribs).

  • Pictogram voting comment.svg Comment I think there were some problems with these infoboxes in one language. Not sure which one it was. Maybe Innocent bystander recalls (I think he mentioned it once).
    --- Jura 11:28, 8 July 2017 (UTC)
    Well, I am not sure if I (today) remember any such problems. But it could be worth to mention that these codes also can be found in sv:Mall:Geobox and ceb:Plantilya:Geobox that are used in the Lsjbot-articles. These templates are not specially adapted to airports, but Lsj used the same template also for this group of articles. The Swedish template has special parameters for this ("IATA-kod" and "ICAO-kod") while the cebwiki articles uses a parameter "free" and "free_type". (Could be worth checking free1, free2 too.) See ceb:Coyoles (tugpahanan) as an example. -- Innocent bystander (talk) 15:17, 8 July 2017 (UTC)
  • @Jntent: in this edit I see the bot replaced FDKS with FDKB, while in en.wp infobox and lead section ar two values for ICAO cadoe : FDKS/FDKB. I would suggest to not change any existing value, or these should be checked manually probably if changed. The most safe way to act here would be to just add missing values. XXN, 14:07, 17 July 2017 (UTC)


WikiCompBot[edit]

WikiCompBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: WikiCompBot (talkcontribslogs)

Task/s: To fetch company URL's

Function details: This bot is specifically designated to scrape official website of companies in a particular country --WikiCompBot (talk) 05:43, 9 March 2017 (UTC)

@WikiCompBot: bot operator should have a separated account for editing. From where you plan to scrap the websites? --XXN, 20:18, 30 June 2017 (UTC)
Could you provide a bit more details about this task? How will you scrape and from where? --Pasleim (talk) 08:07, 11 July 2017 (UTC)

WikiProjectFranceBot[edit]

WikiProjectFranceBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Alphos (talkcontribslogs)

Task/s: Replace all located in the administrative territorial entity (P131) statements pointing from communes of France to cantons of France by territory overlaps (P3179) statements pointing from the same communes to the same cantons, including qualifiers (there are currently only date qualifiers), and adding a P794 qualifier on each new statement to indicate the subclass of canton.

Code: Partially available (for the first step) on GitHub

Function details: As has been the plan of WikiProject France since we proposed properties to better reflect the relationship between communes and cantons of France, we're now getting to actually push all the statements corresponding to these relationships from located in the administrative territorial entity (P131) to territory overlaps (P3179), and add the exact kind of P3179 this represents as qualifiers to said statements, without removing the original statements at first. Roughly 80 000 edits are to be expected.

At a later date, after checking everything went fine on the first pass, we plan on removing the (faulty) P131 statements between communes and cantons entirely, which will also be done by this bot.

VIGNERON (talk)
Mathieudu68 (talk)
Tpt (talk) (si besoin d'aide technique)
Ayack (talk) (surtout pour les MH)
Aga
Ash Crow (talk)
Tubezlob (talk)
PAC2 (talk)
Thierry Caro (talk)
Pymouss (talk)
Pintoch (talk)
Alphos (talk)
Nomen ad hoc (talk)
Framawiki (talk)
GAllegre (talk)
Peter17 (talk)
Pictogram voting comment.svg Notified participants of WikiProject France

--Alphos (talk) 20:00, 8 May 2017 (UTC)

@Alphos: Could you provide an example please? Thanks. — Ayack (talk) 09:05, 9 May 2017 (UTC)
Of course.
Nielles-lès-Bléquin (Q1000003) located in the administrative territorial entity (P131) canton of Lumbres (Q1726007)
would be replaced by :
Nielles-lès-Bléquin (Q1000003) territory overlaps (P3179) canton of Lumbres (Q1726007) (as (P794) canton of France (starting March 2015) (Q18524218))
and
Sainte-Croix (Q1002122) located in the administrative territorial entity (P131) canton of Montluel (Q1726339) (end time (P582) 2015-03-21)
would be replaced by :
Sainte-Croix (Q1002122) located in the administrative territorial entity (P131) canton of Montluel (Q1726339) (end time (P582) 2015-03-21 ; as (P794) canton of France (until 2015) (Q184188))
Other "examples" (in fact the whole list) can be found here :
The following query uses these:
  • Properties: subclass of (P279) View with Reasonator View with SQID, instance of (P31) View with Reasonator View with SQID, located in the administrative territorial entity (P131) View with Reasonator View with SQID
     1 SELECT DISTINCT ?commune ?canton ?qualProp ?time ?precision ?timezone ?calendar WHERE {
     2   ?commune p:P31/ps:P31/wdt:P279* wd:Q484170 .
     3   ?commune p:P131 ?cantonStmt .
     4   ?cantonStmt ps:P131 ?canton .
     5   ?canton wdt:P31 ?cantonType .
     6   VALUES ?cantonType { wd:Q18524218 wd:Q184188 } .
     7   OPTIONAL {
     8     ?cantonStmt ?qualifier ?qualVal .
     9     ?qualProp wikibase:qualifierValue ?qualifier .
    10     ?qualVal wikibase:timePrecision ?precision ;
    11              wikibase:timeValue ?time ;
    12   	         wikibase:timeTimezone ?timezone ;
    13              wikibase:timeCalendarModel ?calendar ;
    14   }
    15 }
    16 ORDER BY ASC(?commune) ASC(?canton)
    
(which is what the bot works on)
Alphos (talk) 09:44, 9 May 2017 (UTC)
Symbol support vote.svg Support The query seems good to me. Can you run a sample batch? -Ash Crow (talk) 18:26, 14 May 2017 (UTC)
The query is undeniably good, but I noticed an issue with edge cases on cantons with double status, working on it and running a small batch (LIMIT 20 or maybe a small french departement), probably later this week. Alphos (talk) 00:05, 16 May 2017 (UTC)
Symbol support vote.svg SupportAyack (talk) 09:02, 16 May 2017 (UTC)
Please, let the bot run a couple of test edits. Besides, please, create the user page of the bot account (e.g. {{bot|Alphos}}). Lymantria (talk) 20:40, 25 June 2017 (UTC)

legislator info[edit]

legislator info (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Neilaronson (talkcontribslogs)

Task/s: Get data on California state legislators for my school project trying to see if I can predict the success of bills in the California state legislature.

Code:https://github.com/neilaronson/ca_bills_project

Function details:Find pages for California state legislators and save the page content for parsing and feature creation for my model. --Neilaronson (talk) 19:43, 13 May 2017 (UTC)

  • @Neilaronson: do this bot will edit Wikidata? --XXN, 20:25, 30 June 2017 (UTC)
  • Which data do you plan to save on Wikidata? --Pasleim (talk) 08:11, 11 July 2017 (UTC)

Jefft0Bot[edit]

Jefft0Bot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Jefft0 (talkcontribslogs)

Task/s: Add references to external ontologies

Code:

Function details: Add equivalent class (P1709) for an external ontology when that ontology already defines mappings to Wikipedia or Wikidata.
For example, Umbel version 1.50 has mappings to Wikipedia here: https://raw.githubusercontent.com/structureddynamics/UMBEL/d3d1d6c0a566fed335fecfadb75f5501437f9163/External%20Ontologies/wikipedia.n3
such as
<http://umbel.org/umbel/rc/MaoriLanguage> umbel:isRelatedTo <http://wikipedia.org/wiki/Māori_language> .
and that Wikipedia page links to Wikidata item Māori (Q36451) . So this item should have equivalent class (P1709) to http://umbel.org/umbel/rc/MaoriLanguage with a reference URL (P854) to the file above. --Jefft0Bot (talk) 15:15, 17 April 2017 (UTC)

Please make several test edits.--Ymblanter (talk) 19:48, 28 July 2017 (UTC)

hz.cmu.bot[edit]

hz.cmu.bot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator:hz.cmu Hz.cmu (talkcontribslogs)

Task/s: analyze temporal patterns of wikipedia articles and associated editors

Code:

Function details: extract articles and revision histories; quantify features; study temporal patterns --Hz.cmu (talk) 16:16, 14 March 2017 (UTC)

  • @Hz.cmu: could you provide some concrete examples about what will do the bot? --XXN, 20:44, 30 June 2017 (UTC)

MsynBot 1[edit]

MsynBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: MisterSynergy (talkcontribslogs)

Task/s: tidy claims of properties with quantity datatype; this means in particular: corrections of units; removal of inapplicable bounds; occasionally correction of quantity values including source addition

Code: in PAWS; maybe later in BitBucket as well

Function details:

  • The functional goal is already described above
  • I use pywikibot with PAWS right now, all scripts are hosted there as well; I consider switching to a tool labs installation of pywikibot at some point later, but this might not happen soon; code would then be hosted at my BitBucket repository
  • I already used a couple of self-written scripts for some smaller correction jobs with my regular account. Example: removal of bounds. However, there are two jobs planned that include a five-figure number of edits each, so this is no longer a small correction job
  • The scripts have hard-coded lists of items and properties which they should touch; there is no automatic item retrieval via querying, and no permanent operation is intended (i.e. one-time job for now)
  • Correction of multiple claims are bundled in one edit, if possible (see example).

MisterSynergy (talk) 18:09, 20 June 2017 (UTC)

Symbol support vote.svg Support Looks good to me. Matěj Suchánek (talk) 18:42, 20 June 2017 (UTC)
Although you show one edit of your own as an example, could your bot perform a couple of test edits? Lymantria (talk) 20:49, 20 June 2017 (UTC)
I will do so soon and inform you with Ping on this page. After I handed in this RfP yesterday, I decided to try to get my Tool Labs pywikibot installation running, in order not to rely on PAWS. This needs some config, but thing look good already. —MisterSynergy (talk) 04:51, 21 June 2017 (UTC)
It seems to me bots should be described in a way that editors who are not conversant with whatever language the bot is written in can still understand what the bot is supposed to do. This description seems inadequate to me. Jc3s5h (talk) 22:00, 20 June 2017 (UTC)
I am willing to add more detail, but unfortunately I am not fully sure which part of the idea needs more explanation. Can you please ask more specifically? Thanks, MisterSynergy (talk) 04:51, 21 June 2017 (UTC)
tidy claims of properties with quantity datatype; this means in particular: corrections of units; removal of inapplicable bounds; occasionally correction of quantity values including source addition is not a good description? The stuff below are Function details. Matěj Suchánek (talk) 06:23, 21 June 2017 (UTC)

Okay, more background:

On quantity properties in general (Help:Data type#Quantity): quantity claims can have snaktypes novalue, somevalue and value just as all other claims. In case of a value snaktype, the value consists of up to four parts:

  1. amount, always a numerical value (mandatory)
  2. unit, either string '1' (means “no unit”, appears as Q199 sometimes) or the entity representation of a unit item, such as string 'http://www.wikidata.org/entity/Q11574' for unit second (Q11574); the unit part is mandatory as well, even for quantities which are unit-less
  3. upperBound and lowerBound, absolute numerical values, always (?) symmetrical around amount (i.e. 100±1 has upperBound=101 and lowerBound=99; 100±0 has upperBound=100 and lowerBound=100); this was mandatory in the past, but it is not any longer and we can store bound-less quantities where these fields are just not there; bounds express the uncertainty interval of the quantity

Since we cannot use snaktypes for the individual parts of a quantity, we need to signal the absense of quantity parts differently. Oddly, this is inconsistently solved right now: “no unit” is expressed by 'unit':'1', while “no bounds” (i.e. no uncertainty interval) is expressed by the absense of lowerBound and upperBound. Bounds and units with “somevalue” character should not happen and can be ignored here to my experience.

On the situation of units used in properties:

  • If an editor enters a quantity claim into an item via the web interface, the unit field is marked “optional”, although the unit isn’t really optional for many properties. If errors happen due to forgotten/ignored units, they cannot be resolved automatically.
  • However, editors occasionally select the wrong unit by mistake, e.g. second (Q636099) instead of second (Q11574), and this can in fact be fixed in certain constricted areas.
  • There have also been (automatic) data imports in which the unit was apparently just forgotten and the quantity appears unit-less, although it actually should have a unit. There was a discussion recently at WD:PC (Wikidata:Project chat#Unitless claims) about what to do with those cases, including case numbers per quantity property for a couple of cases. I found that a large fraction of items referred to there did indeed have proper sources provided in an external identifier, that can be crawled and evaluated automatically and that provide information about the missing unit. This can be fixed automatically as well in some cases.

On the situation of bounds:

  • Due to the changes of how we store bounds in quantity units, they are used quite inconsistently right now. For a while, there was automatically something like ±1 (or ±0?) added to quantity values. This uncertainty interval was simply derived from the precision of the amount given, and very annoying for the editors who did not ask for it. I believe that most of these bounds are in fact wrong, but there is unfortunately little we can do to correct them automatically. However, after bounds became optional, editors continued and used quantities without bounds to correctly express the non-existence of an uncertainty interval for the given claim. There are plenty of items which use both styles at the same time.
  • There are even many quantity properties which are in fact (more or less) completely unsuitable for bounds. Uncertainty expresses imperfect information, such as it is the case for physical quantities in particular and most quantities which have been measured/extrapolated/simulated/… in general. Quantity data type is used for properties which inherently do not have uncertainties, such as:
    • Elo rating (P1087): calculated with an algorithm, based on input which is completely known and in the past
    • ranking (P1352): used a lot for sports results
    • maximum capacity (P1083): used for sports venues; not a measured quantity, but merely a regulation by some authority
    • number of children (P1971): simply counted; if unclear (e.g. amount is 2 or 3), we’d rather provide claims with different values and sources than an amount of “2.5±0.5”; the same applies to many other “number of …” properties
  • I proposed a “no bounds” constraint at Template talk:Constraint#Requests: Integer value and No bounds recently, but unfortunately this was ignored until now. It would help a lot to improve use of bounds.

How I plan to work:

  • The current version of the script is at [4]; I am not sure whether it is visible to other users, one at least needs to log on to PAWS. However, I changed plans and set up my pywikibot installation at Tool Labs, which would require some changes to the script (separate input data from the actual script). I already have a BitBucket repo which will then hold the code of the python scripts. If you cannot see the code in PAWS right now but you want to look at it, I will provide it differently on request.
  • I start with manual queries to get an impression of unit or bounds use, usually per property. If I encounter a fixable situation, I create (python) lists of items to be worked on, such as inputdata = [ 'Q1', 'Q2', 'Q42' ]. The script also knows which quantity properties to work on and loops over all items and properties (or qualifiers) to correct data if necessary. Examples would be:
    • Remove bounds from a given property or qualifier only if they are ±0 on a predefined set of items
    • Replace a given unit by another one in a given property or qualifier on a predefined set of items
  • Although we do have a lot of quantity properties with problems (as outlined above) right now, I do not plan to work on a reasonable fraction of them. Most work will focus on sports-related properties, and if someone asks for different data/properties to work on I will provide help there as well of course. If necessary, I will show up on property talk pages to discuss my plans.

MisterSynergy (talk) 10:14, 21 June 2017 (UTC)

Symbol support vote.svg Support In view of the way the bot will be run on well-identified sets of data with well-understood errors, I support it. Jc3s5h (talk) 12:49, 21 June 2017 (UTC)

Symbol support vote.svg Support thanks for the detailed description of the plans here. I think it would be good for you to make a note on the property talk page each time your bot starts updating values for that property. ArthurPSmith (talk) 14:32, 23 June 2017 (UTC)

Please make some test edits.--Ymblanter (talk) 10:52, 24 June 2017 (UTC)
I will do so, and inform you as well as Lymantria (as mentioned above) as soon as this has happened. I am a bit too much involved in a WD:AN discussion at the moment. —MisterSynergy (talk) 10:56, 24 June 2017 (UTC)

MexBot 2[edit]

MexBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: MarcAbonce (talkcontribslogs)

Task/s: Add official population data for Mexican municipalities.

Code: https://gitlab.com/a01200356/MexBot/blob/master/poblaciones.py

Function details:
The script finds all Mexican municipalities with an INEGI municipality ID and gets all the official population data available from INEGI's (Mexican public institute that does the census) API.
It will either add or update this data, with INEGI as the source.
It will also add census as the method for the year ends in 0, when the census is made.
MarcAbonce (talk) 21:45, 8 June 2017 (UTC)

Symbol support vote.svg Support --PokestarFan • Drink some tea and talk with me • Stalk my edits • I'm not shouting, I just like this font! 23:16, 8 June 2017 (UTC)
Pictogram voting comment.svg Comment: Under which license INEGI publishes population data? XXN, 14:41, 9 June 2017 (UTC)
Not explicitated but it is like a CC BY, see point f in section "Del libre uso de la información del INEGI" of Términos de uso. I don't think is compatible. --ValterVB (talk) 17:35, 9 June 2017 (UTC)
Indeed, it only requires attribution, which is precisely what my script intends to add. Why would it be incompatible? Most of this data has already been manually added by people and apparently a Wikipedia scraping script too, but it's mostly unsourced. --MarcAbonce (talk)
Here we use CC0, if data here need citation the data is incompatible with the license. --ValterVB (talk) 05:47, 11 June 2017 (UTC)
Can census data even be licensed, though? As far as I know, facts cannot be licensed anywhere. If this is the case, this license would only be enforceable with the statistical data they generate (which I'm not using) but it wouldn't be enforceable for a simple, "natural" fact such as a total population.
Also, as I mentioned, this data is already allowed in practice. Wikipedia importing bots have added census data into Wikidata by claiming Wikipedia as the source (which is also CC0 incompatible, by the way), but this data is not generated by Wikipedia, but rather taken from INEGI and imported without source.
So, unless you actually plan to delete all the unsourced and Wikipedia sourced Mexican population data from this site, the most reasonable thing to do would be to treat this data the way it has been treated so far, for the sake of consistency.
--MarcAbonce (talk)
Symbol support vote.svg Support Mexico is outside of the EU and thus there are no suis genesis concerns. Population data itself is about facts that in their nature aren't protected by copyright. ChristianKl (talk) 09:31, 25 June 2017 (UTC)
The license not depend if Mexico is in or out of EU. Wikidata use CC0, INEGI ask explicity "Must give credit for the INEGI as an author", for me they aren't compatible. --ValterVB (talk) 14:32, 25 June 2017 (UTC)

Emijrpbot 8[edit]

Emijrpbot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Emijrp (talkcontribslogs)

Task/s:

Bot adds imported from (P143) references to Wikinews article (Q17633526) items. Particularly it adds references to instance of (P31) and language of work or name (P407) properties. See example

Code: no coded yet

Function details:

Bot uses sitelink to detect which language version of Wikinews hosts the article, and it adds the imported from (P143) reference. When there are more than one sitelink, it picks just one (the largest Wikinews), based on number of articles.

--Emijrp (talk) 11:42, 25 March 2017 (UTC)

For my opinion, see my comment in the previous request for permission. Matěj Suchánek (talk) 17:53, 25 March 2017 (UTC)
  • Pictogram voting comment.svg Comment It's good to add "imported from" as a "source" when importing data from Wikipedia (or Wikinews here), but I don't think it adds much in terms of references. To calculate ratios, one might as well ignore it. For P31, such ratios probably don't add much anyways.
    --- Jura 18:19, 25 March 2017 (UTC)
  • @Matěj Suchánek, Jura1:, are we ready for approval given that the previous one was withdrawn?--Ymblanter (talk) 16:04, 7 April 2017 (UTC)

ZacheBot[edit]

ZacheBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Zache (talkcontribslogs)

Task/s: Import data from pre-created CSV lists.

Code: based on Pywikibot (Q15169668), sample import scripts [7]

Function details:

--Zache (talk) 23:29, 4 March 2017 (UTC)

@Zache:, could you pls make a couple of test edits, I do not see any lakes in the contribution of the bot.--Ymblanter (talk) 21:20, 14 March 2017 (UTC)
@Zache: Are you still planning to do this taks? If so, please provide a few test edits. --Pasleim (talk) 08:13, 11 July 2017 (UTC)
Hi, i did the vaalidatahack without bot permissions so that one is done already. The lake thing is ongoing project and currently done using quickstatements for single lakes and CC0 licence screening for larger imports is still the same. Most likely there is also WLM related data imports in this summer by me, but i am not sure how big (most like under < 2000 items which some are updates for existing items and some are new) User Susannaanas started this and i am continuing with filling the details to the WLM the targets. Most likely this WLM stuff is made using pywikibot instead of quickstatements because i can do consistency checks with the code. --Zache (talk) 11:12, 11 July 2017 (UTC)

НСБот[edit]

НСБот (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Nikola Smolenski (talkcontribslogs)

Task/s: Import Zapis database of Wikimedia Serbia.

Code: Code not yet written.

Function details: I need a bot to import data from the database of Zapis trees as a part of Zapis - Sacred Tree project of Wikimedia Serbia. This will essentially be data from the table at sr:Списак записа у Србији#Табела регистрованих записа though there are additional data (such as tree height and similar).

The first bot task should be to fix data about municipalities of Serbia, examples of manual edits: [8] and [9]. Then it should create items about cadastral municipalities of Serbia, then about the trees.

I have previously operated commons:User:NSBot and sr:Корисник:НСБот without any problems. --Nikola (talk) 12:08, 24 February 2017 (UTC)

@Nikola Smolenski:, please register the bot account and make some test edits.--Ymblanter (talk) 08:48, 3 March 2017 (UTC)

YULbot[edit]

YULbot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: YULdigitalpreservation (talkcontribslogs)

Task/s:

  • YULbot has the task of creating new items for pieces of software that do not yet have items in Wikidata.
  • YULbot will also make statements about those newly-created software items.

Code: I haven't written this bot yet.

Function details:

This bot will set the English language label for these items and create statements using publisher (P123), ISBN-13 (P212), ISBN-10 (P957), place of publication (P291), publication date (P577). --YULdigitalpreservation (talk) 18:04, 21 February 2017 (UTC)

good to run a test with a few examples so we can see what you're planning! ArthurPSmith (talk) 20:46, 22 February 2017 (UTC)
Interesting. Where does the data come from? Emijrp (talk) 12:04, 25 February 2017 (UTC)
The data is coming from the pieces of software themselves. These are pieces of software that are in the Yale Library collection. We could also supplement with data from oldversions.com.YULdigitalpreservation (talk) 13:07, 28 February 2017 (UTC)
Please let us know when the bot is ready for approval.--Ymblanter (talk) 21:12, 14 March 2017 (UTC)

JayWackerBot[edit]

JayWackerBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: JayWacker (talkcontribslogs)

Task/s: This will be used to set and remove Quora topic identifiers. It will also update the matches as Quora topics are renamed or merged. We are manually vetting the 250,000 MixNMatch matches of Quora topic to Wikidata entity. This bot will not update other properties of the Wikidata entity.

Code:

Function details: I'm unsure how much detail is necessary

set_quora_identifier(wikidata_id, quora_relative_url)

remove_quora_identifier(wikidata_id, quora_relative_url)

--JayWacker (talk) 17:25, 9 February 2017 (UTC)

Could you please explain in more detail on which basis you will remove or update Quora topic ids? How will setting Quora topic ids be different from the current approach with Mix'n'Match? --Pasleim (talk) 13:32, 14 February 2017 (UTC)
@Pasleim: Mix'n'Match may be used more rapidly with a bot-flagged account. @JayWacker: You may have missed this question. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:02, 19 February 2017 (UTC)
@Pasleim: First, we are manually vetting the 255,000 matches that MixNMatch identified. As generous as @Pigsonthewing: has been, we are taking his time regularly going through the hundreds of thousands of matches outside of MixNMatch and then giving them to him to set. Additionally, Quora topic names change regularly and are merged together and this results in the URLs changing. This means that the Wikidata-Quora identifiers will be out of date (though still redirected to the correct place). We may also be creating Quora topics from Wikidata entities means we can set these identifiers directly. We can also resolve the constraint violations more efficiently. JayWacker (talk) 04:04, 21 February 2017 (UTC)
  • Symbol support vote.svg Support. While I'm happy to assist Quora as long as needed, it's right and proper - and welcome - that they should be able to contribute directly. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:58, 21 February 2017 (UTC)
I will approve the bot in a couple of days provided there have been no objections raised.--Ymblanter (talk) 20:19, 26 February 2017 (UTC)
Oops, sotty, should have noticed earlier. Please make a couple of test edits.--Ymblanter (talk) 21:53, 28 February 2017 (UTC)
We'll do a couple of test edits and I'll get back to you (this may be a few weeks to get to the top of the stack). JayWacker (talk) 18:18, 1 March 2017 (UTC)

YBot[edit]

YBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Superyetkin (talkcontribslogs)

Task/s: import data from Turkish Wikipedia

Code: The bot, currently active on trwiki, uses the Wikibot framework.

Function details: The code imports data (properties and identifiers) from trwiki, aiming to ease the path to Wikidata Phase 3 (to have items that store the data served on infoboxes) --Superyetkin (talk) 16:42, 12 January 2017 (UTC)

It would be good if you could check for constraint violations insteaf of just blindly copying data from trwiki. These violations are probably all caused by the bot. --Pasleim (talk) 19:26, 15 January 2017 (UTC)

EaasServiceBot[edit]

EaasServiceBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Sharmeelaashwin (talkcontribslogs)

Task/s: Bot which talks to EaaS(Emulation-as-Service) to store and retrieve the rendering software and OS for a file format. This helps in opening the files used in Digital Preservation

Code:

Function details: It contains following APIs:

  1. . This Bot contains an API to store the file formats in WikiData. This API will be called when the user decides to save this file format information in EaaS
  2. . This Bot contains an API to read the rendering software's information from WikiData to open the file formats in EaaS

--Sharmeelaashwin (talk) 15:08, 10 January 2017 (UTC)

Which statements do you plan to add? As far as I know there isn't yet a "rendering software" property. --Pasleim (talk) 19:40, 15 January 2017 (UTC)
I would like to add a new page which stores all these information(file format, rendering software and environment). When an user opens a file format with a particular software, we will store this information in Wikidata and when another user tries to open the same file, we will fetch data from Wikidata and open the file with the software name retrieved from Wikidata. I will also store the environment(OS and dependent softwares) information in Wikidata. --Sharmeelaashwin (talk) 11:08, 16 January 2017 (UTC)
  • There is readable file format (P1072), but I don't quite see how you'd store here which one gets used if several render the same format.
    --- Jura 06:22, 17 January 2017 (UTC)
  • How about creating example items manually? ChristianKl (talk) 07:20, 17 January 2017 (UTC)
How much data do you plan to add? ChristianKl (talk) 07:20, 17 January 2017 (UTC)
  • @Jura: "Readable file format" stores the list of file formats that can be opened in a software. I would like to do just the opposite i.e, if I have a file format, I would like to have a list of softwares that can open this file format and also the OS. This has the following advantages
    1. If a user tries to open a file is EaaS(Emulation as Service) application, then from the file format, EaaS can query Wikidata and get a list of softwares that can open the file requested by user.
    2. If any Wikidata user knows that a particular file format can be rendered by a software, then he/she can directly update it in Wikidata which is much easier when compared to updating it i@n PRONOM.
@ChristianKl : I will manually add example items and let you know. In the initial phase I am intending to add a major file formats like .doc, .jpeg, .ppt, .tx but the final goal is to store all the file formats to be stored in Wikidata. I plan to create a table in a Wikidata page and keep updating the same. -- Sharmeelaashwin (talk) 09:16, 18 January 2017 (UTC)
Please make some test edits.--Ymblanter (talk) 22:23, 27 January 2017 (UTC)
I am not really happy with this performance--Ymblanter (talk) 00:00, 5 March 2017 (UTC)

DiscogsBot[edit]

DiscogsBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Ocram89 (talkcontribslogs) and AndreaNocera (talkcontribslogs)

Task/s: Update wikidata entries using Discogs (Q504063) artists dump (just Complete and Correct data).

Code: The code will be, hopefully, uploaded on github in a couple of days.

Function details: The Bot uses a filtered XML data dump about artists from Discogs (Q504063), the data used is the one that have the <data_quality> element as "Complete and Correct". Once the data is parsed, the bot check if there already exists an entity, this step is done through a SPARQL query which get all the musician (Q639669) or musical ensemble (Q2088357) with the name (or alias, or name variations) got from the XML dump. If the entity already exists, then new statement can be inserted (e.g. if a band does not have its members, this can be inserted using member of (P463) in the entity), if the entity does not exists a new item is created. If there are more entity with the same name, nothing is changed, to avoid involuntarily wrong statement. --DiscogsBot (talk) 11:32, 12 December 2016 (UTC)

Could you do a few test edits? Which statements, labels and descriptions will you add to a new created item? --Pasleim (talk) 13:09, 12 December 2016 (UTC)
We are doing tests on test.wikidata. The new item will have label, description and aliases and it will have statements Discogs artist ID (P1953) and if it's a band all the members or if it's a member of a group the name of the group. We are also trying to analyze the profile to get some other data like instruments, occupation etc. AndreaNocera (talk) 13:26, 14 December 2016 (UTC)
The edits done on test.wikidata.org look good. But I would still prefer if you could do around 100 edits here on Wikidata to see if you can dedect reliably already existing artist items. --Pasleim (talk) 19:51, 15 January 2017 (UTC)

DoctorBot[edit]

DoctorBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: DoctorBud (talkcontribslogs)

Task/s: Import ZFIN gene information and create (or augment) a corresponding Item in Wikidata

Code: Experimental and not yet public

Function details:

  • Import TSV data from http://zfin.org/downloads/gene.txt
  • Extract two columns of that data, one which will identify an Item (a Gene), the other a property of that Gene
  • Create an Item in Wikidata for the Gene
  • Create a Statement in Wikidata that binds the property to the Gene

--DoctorBot (talk) 03:00, 27 November 2016 (UTC)

@DoctorBot: The bot owner must use a different account from the bot itself.--Jasper Deng (talk) 03:00, 28 November 2016 (UTC)

DoctorBud DoctorBud (talkcontribslogs) is now declared as the Operator of DoctorBot in this Request.

Could you please make some test edits?--Ymblanter (talk) 16:03, 8 December 2016 (UTC)
@DoctorBud, DoctorBot: Are you still interested in this request?--Jasper Deng (talk) 08:44, 20 December 2016 (UTC)
@Jasper Deng: Yes, I'm still working on DoctorBot's code, but my request for DoctorBot being a Bot operated by DoctorBud is still important, if that's what you are asking. Thanks. --DoctorBud (talk) 00:51, 21 December 2016 (UTC)

WikiLovesESBot[edit]

WikiLovesESBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Discasto (talkcontribslogs)

Task/s: Miscellaneous tasks associated to photo upload campaigns promoted by WM-ES:

  • Assignment of commons categories to items handled in the campaigns (for example Wiki Loves Earth, Wiki Loves Folk, Wiki Loves Monuments, Photographs from Spanish Municipalities without pictures, and the like.
  • Sourcing of statements for items handled in the campaigns...

Code: Global repository is in here. Bot code is here.

Function details: The bot takes as input a series of lists (so called annexes in the Spanish Wikipedia, see example here and extracts necessary information: mainly wikidata item and commons category. If found, the bot does as follows:

  • Look up the Wikidata item.
  • Determines whether "Municipality of Spain" statement is available in P31 claim. If not, it creates the statement. If available, the statement is sourced to Spanish Wikipedia.
  • If the source (the list in the Spanish Wikipedia) provides a category, the bot determines whether a claim for Commons-category is available. If not, it creates the claim. If available, the claim is sourced to Spanish Wikipedia.
  • Finally, a commons sitelink for the category provided in the source is inserted if not available. If a gallery was already provided as commons sitelink, it's not modified.
  • Inconsistencies are logged during the process.

--Discasto (talk) 10:25, 3 July 2016 (UTC)

Symbol support vote.svg Support I strongly support this request. --Rodelar (talk) 22:04, 3 July 2016 (UTC)
Symbol support vote.svg Support I also support. --Harpagornis (talk) 15:00, 4 July 2016 (UTC)
Symbol support vote.svg Support I support this request. Ivanhercaz (talk) 16:17, 4 July 2016 (UTC)
Symbol support vote.svg Support I support this request. --Bauglir (talk) 16:28, 4 July 2016 (UTC)
Symbol support vote.svg Support I support this request. --ElBute (talk) 16:47, 4 July 2016 (UTC).
Symbol support vote.svg Support I support this request.--Pedro J Pacheco (talk) 20:14, 4 July 2016 (UTC)
Symbol support vote.svg Support The bot operator is reliable and knows what he does Poco2 21:07, 4 July 2016 (UTC)
Symbol support vote.svg Support I support this request. The operator has done a good work with other bots in different projects. --Millars (talk) 15:47, 5 July 2016 (UTC)
Symbol support vote.svg Support I support this request. --Dorieo (talk) 17:41, 6 July 2016 (UTC)
Sorry, Jura, I missed your comment. I have to say that I don't fully understand your comment (mainly that part related to the amount of municipalities mismatch). With regard the second part, I will patch the code to consider also subclasses. However, parroquia (Q3333265) does not apply, as a parroquia is a subdivision of a municipality. The lists we're handling have been reviewed several times by the WM-ES members and all the items are actually municipalities. Smaller subdivisions can be considered in next editions, but not now. Therefore, my only concern relates to the subclasses (I didn't actually consider that possibility). Best regards --Discasto (talk) 22:02, 22 July 2016 (UTC)
And I didn't notice your answer. There are several possible reasons for the mismatch in the number of municipalities: we could have already an item for the municipality, but it just isn't linked to eswiki. The easiest way to solve this would be to add the statements and then check the result for duplicates (it could also be done in advance, but this may be more complicated).
As far as "concejo of Asturias" is concerned, you could add both or replace it. Whatever suits interested editors best.
The "parroquia" question seems minor (11 items currently): If you look at the query result you will notice that some items have this in P31 in addition. This can mean that the article in some other wiki is about the parroquia or there is some other mixup. These items may need to be split.
--- Jura 08:42, 17 August 2016 (UTC)


  • It'll be great if some active editors of Wikidata could give their opinions. Canvassing of users with a low amount of contributions doesn't help. Sjoerd de Bruin (talk) 14:55, 7 July 2016 (UTC)
I took a look at contribs - it looks like a lot of entries have already been made, but the bot was blocked as unapproved. From my review of the entries made the bot seems to be operating reasonably. However, adding a reference of "imported from xx wikipedia" is barely better than no source at all, I'm not sure this is really helpful. If there's an actual es.wikipedia.org page that is the source of the information, providing that via "reference URL" and "retrieved on" properties would be more useful. An external source for this data would be much better. ArthurPSmith (talk) 14:42, 8 July 2016 (UTC)
I have no strong opinion on this. I do agree on providing an external source if available. It's not the case in most of the situations we're handling. Therefore, I'll simply skip this step. In fact, the core functionality (which I'm currently doing by hand) was related to setting commons categories. As we're handling all the items in the list, it seemed sensible to add sources. If you feel it's useless (unless a proper source is provided), I'll skip this step. Thanks for providing feedback --Discasto (talk) 22:45, 12 July 2016 (UTC) PS: yes, it's been blocked in the middle of a task that nowadays I have to do by hand. I don't really understand this block. Seems to me the typical bureaucratic behaviour that harms more than helps
I am going to approve the bot tomorrow provided there have been no objections.--Ymblanter (talk) 09:46, 13 July 2016 (UTC)
It would be good to have an answer to my question. We don't want to end up with even more duplicates.
--- Jura 12:34, 15 July 2016 (UTC)
@Ymblanter, Discasto: Please see my comment above.
--- Jura 08:43, 17 August 2016 (UTC)
@Jura1: I saw it weeks ago (and I answered :-), see answer on 22 July... I assumed you had this page in your watch list) --Discasto (talk) 08:52, 17 August 2016 (UTC)
@Discasto: Well, generally I notice, but here I missed it. Bot requests aren't exactly my preferred stuff ; ). Did you notice my comment from today?
--- Jura 08:54, 17 August 2016 (UTC)

Pictogram voting comment.svg Comment I drop this request. However, may I ask the account to be unblocked? It will not be active, but being blocked sincerely mean an overkill. Best regards --Discasto (talk) 21:50, 23 August 2016 (UTC)

@Jura1:, @Discasto:: The task seems useful, is there any chance you can agree and proceed with the task?--Ymblanter (talk) 07:59, 24 August 2016 (UTC)
I think it's essentially a question of checking the result. This could be done after addition. A way to flag former municipalities needs to be determined (by end date and/or with some Q19730508 item). In the meantime, Abián is working with Spanish municipalities (Wikidata:Bot_requests#Mayors_of_Spain).
--- Jura 08:42, 26 August 2016 (UTC)

MatSuBot 6[edit]

MatSuBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Matěj Suchánek (talkcontribslogs)

Task: Convert HTML entities in terms and maybe statements to regular text.

Code: Not yet decided on the implementation.

Function details: The biggest problem is at the moment querying for items which have such errors (if I don't find any other possibilty, I will try to combine SQL and PWB). --Matěj Suchánek (talk) 19:12, 1 July 2016 (UTC)

Please make some test edits.--Ymblanter (talk) 14:47, 5 July 2016 (UTC)
For your information, I put this Time2wait.svg On hold since I am not able to query for the items. I hope to find a solution in the near future. Matěj Suchánek (talk) 18:49, 20 June 2017 (UTC)

1-Byte-Bot[edit]

1-Byte-Bot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: 1-Byte (talkcontribslogs)

Task/s: Import census data from the Turkish Statistical Institute.

Code: Based on pywikibot

Function details:

--1-Byte (talk) 15:22, 2 March 2016 (UTC)

Update: Currently on hold as it's not entirely clear how to cite the data. --1-Byte (talk) 08:58, 3 March 2016 (UTC)
@1-Byte: Do you now know how to cite the data? Mbch331 (talk) 20:57, 25 August 2017 (UTC)

Phenobot[edit]

Phenobot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Jjkoehorst (talkcontribslogs)

Task/s: The first step will be to improve the lineage annotation of organisms including taxon identifiers, correct species names and corresponding references using the UniProt Taxonomy database. The next step will be to include missing organisms into Wikidata and phenotypic information such as biosafety level, oxygen requirements and other features. Continuous discussion can be found here User:Phenobot/Discussion

Code:https://bitbucket.org/jjkoehorst/wikidatabots

Function details:This bot is based upon the basis of the ProteinBoxBot framework. It will use the UniProt Taxonomy SPARQL end point for data extraction and initially will work on completing existing entries as much as possible with correct names and taxon identifiers and missing species will be added to WD. For strains with existing phenotypic information this can be complemented from various sources which are currently under investigation such as GOLD or DSMZ. --jjkoehorst (talk) 15:13, 4 February 2016 (UTC) Abbe98
Achim Raschka (talk)
Brya (talk)
Dan Koehl (talk)
Daniel Mietchen (talk)
Delusion23 (talk)
Faendalimas
FelixReimann (talk)
Infovarius (talk)
Joel Sachs
Josve05a (talk)
Klortho (talk)
Lymantria (talk)
Michael Goodyear
MPF
PhiLiP
Andy Mabbett (talk)
Prot D
pvmoutside
Rod Page
Soulkeeper (talk)
Tinm
Tommy Kronkvist (talk)
TomT0m
Pictogram voting comment.svg Notified participants of WikiProject Taxonomy

@Succu: Can you have a look at this request? --Pasleim (talk) 10:32, 5 February 2016 (UTC)
I have some problems with the task "correct species names" NCBI is not a nomencatural database. It contains spelling errors like other databases too. And I have problems with this kind of sourcing. The NCBI-ID is allready referenced, nothing is imported from UniProt. The Disclaimer tells us „The NCBI taxonomy database is not an authoritative source for nomenclature or classification - please consult the relevant scientific literature for the most reliable information.“ --Succu (talk) 11:40, 5 February 2016 (UTC)
Here the Bot removed taxon name (P225). --Succu (talk) 12:10, 5 February 2016 (UTC) PS: Pseudomonas putida 10-23 (Q22661287) P225 is missing. --Succu (talk) 07:24, 6 February 2016 (UTC)
I agree with Succu. Why go change species names, based on UniProt? Could do serious damage. And indeed that kind of sourcing is unwanted and adds nothing: database is slow enough as it is. - Brya (talk) 11:58, 5 February 2016 (UTC)
This proposal does not seem to be mature. The Uniprot taxonomy database is a customized version of the NCBI taxonomy database, which itself is not reliable for taxonomy anyway. It is currently not clear if the bot owner knows enough about taxonomy and nomenclature to understand the issues associated with Wikidata taxon items. Also the proposed use of imported from (P143) does not seem appropriate.
Nevertheless my understanding is that many of this bot's contributions would be made in microbiology, and the issues would be a little different if its contributions were limited to this area. Otherwise I see no reasons to prevent the bot from adding “biosafety levels, oxygen requirements and other [such] features”.
Tinm (d) 18:29, 5 February 2016 (UTC)
Yes the main basis of this bot will be within microbiology and I can restrict the bot to remain within prokaryotes. About the naming, what I am currently doing is to leave the name alone if it exists in UniProt taxonomy as either other name or scientific name. But I can leave the name as it is as I am mostly relying on the taxonomic identifier from the NCBI/UniProt. My main priority is to have the NCBI Taxonomy identifier correct / filled in so that I can include he phenotypic characteristics and also easily can verify wether an organism page has been created and if not create as such. I can also skip adding references if one is already available. --jjkoehorst (talk) 06:45, 6 February 2016 (UTC)
Yes, this taxon name is pretty bad. And again, the fact that the rank is that of species does not need a reference (this is so by definition), and as there is a link to NCBI, the fact that the taxon name is accepted by NCBI does not need to be repeated in the form of a reference to taxon name. - Brya (talk) 07:44, 6 February 2016 (UTC) -also beyond understanding - Brya (talk) 07:51, 6 February 2016 (UTC) - And "instance of taxon" means that "taxon name" is present in the item. UniProt cannot know anything about that, so adding a reference to "instance of taxon" is pure misrepresentation. - Brya (talk) 07:58, 6 February 2016 (UTC)
Sorry about those naming, ill restrain the bot then to only prokaryotes if you prefer and to only update missing naming and NCBI Taxonomy information. When that works out good i'll make some property requests for the phenotypic information as stated earlier, ok? --jjkoehorst (talk) 09:31, 6 February 2016 (UTC)
If that means 1) only missing names of prokaryotes and 2) sourcing only for NCBI Taxonomy information, then yes, OK. - Brya (talk) 13:00, 6 February 2016 (UTC)
Looks like the databases are out of sync. NCBI Taxonomy ID (P685)=208964 gives Pseudomonas aeruginosa PAO1 (www.ncbi.nlm.nih.gov/taxonomy) and Pseudomonas aeruginosa (strain ATCC 15692 / PAO1 / 1C / PRS 101 / LMG 12228) (www.uniprot.org/taxonomy). This explains „adjustments“ like this one. --Succu (talk) 11:05, 6 February 2016 (UTC)
Looks like UniProt provides five separate names, rolled up into one entry? - Brya (talk) 13:00, 6 February 2016 (UTC)
There is a mapping between NCBI Taxonomy ID (P685) and a so called „Official (scientific) name“ used by UniProt. So maybe we need a qualifier for P685 to indicate this name. --Succu (talk) 16:44, 6 February 2016 (UTC)
Yes I had an email conversation with uniprot and this was a reply about that case: The idea is not to use a concise name. A same strain may be known by different names because it has been deposited in different organizations (institutions, private companies, etc) with different names. So we try to track these co-identical strain names used by the major concerned organizations for a specific strain. This name is stored as scientifcName and all the variances are stored among other names. --jjkoehorst (talk) 19:30, 6 February 2016 (UTC)
So what's your conclusion? BTW: I stumbled over User:Phenobot/Discussion, which looks like an outline of the intended bot task, but not mentioned here. --Succu (talk) 20:28, 6 February 2016 (UTC)
Well one way it makes sense to use a general nomenclature which encapsulates all possible extra namings but it is not the true scientific name. Maybe a taxon synonym name entry could be used which lists other names belonging to this organism.Yes the discussion page is to discuss the roadmap after the general taxon identification and naming is completed sorry that I did not mention it here but it was not completed yet to my opinion but feel free to comment on it if you like... --jjkoehorst (talk) 08:02, 7 February 2016 (UTC)
Strictly speaking these are not scientific names at all. The ICNP does not cover names at a rank lower than subspecies. AFAIK there is no formal system for naming strains, so this may well happen on an ad hoc basis, or according to a local standard. In fact, it would help somewhat not to put these in "taxon name". - Brya (talk) 08:29, 7 February 2016 (UTC)
Then I would suggest that the names currently in WD should correspond to the NCBI nomenclature or to any of the Uniprot (scientificnames/othernames) if this is not the case then it should be either the scientific name from the NCBI or from UniProt if there is no reference available. What do you think? And where would you place the other names? As a common name or something else? --jjkoehorst (talk) 08:40, 7 February 2016 (UTC)
? The names in NCBI/Uniprot are not scientific names (not regulated by a Code of nomenclature). The most obvious way to handle strains would be to have a property "strain name" (perhaps to be combined with "parent taxon", etc). - Brya (talk) 09:33, 7 February 2016 (UTC)
My consideration are the same. --Succu (talk) 10:10, 7 February 2016 (UTC)
I agree a strain property should then be created which specifies the name of a strain? However taxon name then becomes obsolete for strains at least if I am correct. The elements that are obligatory for strains are then parent taxon, taxon rank, NCBI Taxonomy ID, general labels and instance of. Anything that else that can be used with the current properties? --jjkoehorst (talk) 11:49, 7 February 2016 (UTC)
Yes, this new property should be used instead of P225. This would reduce "Format" violations of P225 too. --Succu (talk) 12:54, 7 February 2016 (UTC)
Sounds good, who is going to propose for a new property for taxon name and can this taxon name then also contain multiple values, such as synonyms of the strain name or should another property be made for that? --jjkoehorst (talk) 14:47, 7 February 2016 (UTC)
I think we need a second property UniProt name to modell the relationship to the NCBI id. In case of strains we could use aliasses to add the name variants. You can propose them at Wikidata:Property proposal/Natural science. --Succu (talk) 18:49, 7 February 2016 (UTC)
A property "UniProt" to link to the UniProt-entries may be handy. Not sure what else you mean, as UniProt-entries may concern regular taxa as well as strains and whatever else UniProt includes. - Brya (talk) 06:40, 8 February 2016 (UTC)
I am not much in favour of multiple names in one item, and including out-of-use names beside the current name seems like a recipe for disaster. But we really do need a separate property "taxon synonym (string)" beside the present "taxon synonym [item]". - Brya (talk) 15:53, 7 February 2016 (UTC)
Yes we should request for a taxon synonym string variant. Then by default it would be the scientific name of the NCBI nomenclature if no better name is available? --jjkoehorst (talk) 19:50, 7 February 2016 (UTC)
Synonyms are an area full of hidden dangers. What we may really need are:
  • "taxon synonym, homotypic (item)"
  • "taxon synonym, heterotypic (item)"
  • "taxon synonym, homotypic (string)"
  • "taxon synonym, heterotypic (string)"
Especially heterotypic synonyms may vary strongly, depending on point of view (references!). Brya (talk) 06:40, 8 February 2016 (UTC)
I looked into: Property:P1843 which is a common name for a given taxon. As basis we could use the NCBI nomenclature for strains (and/or others?). And over time add the homotypic/heterotypic naming. Shall I run a test with the restricted settings I have now? Only bacteria, no name updating if there is a name available and no reference adding if the value is already present? --jjkoehorst (talk) 08:01, 8 February 2016 (UTC)
@Brya: Regarding how to handle synonyms, I have thought of a way of doing things that would solve a very big part of the issues we encounter with the current one. I'm going to make a post about that on the project talk page when I'll have a bit of time. It would imply significant changes but I really believe it would answer many issues efficiently. Anyway, I guess you will see when I put it up. —Tinm (d) 02:34, 9 February 2016 (UTC)
I will be most interested to see what you come up with. - Brya (talk) 06:13, 9 February 2016 (UTC)

Greetings all. I am part of the GeneWiki team and I am adding genes and proteins for bacteria under our MicrobeBot (talkcontribslogs) account. see: MicrobeBot Task Page For my project it is important that there remain distinct strain items with NCBI taxonomy identifiers so I can link genes and proteins to them via found in taxon (P703). Just a thought, but we could distill some of the views here in a mockup of a Wikidata strain item in this table below? Using Pseudomonas aeruginosa PAO1 (Q21065234) as an example. I added some of the basics that are there for strain items now. I personally think a new 'NCBI strain name' type of property would be a good thing to have as these strain names are directly linked to the NCBI Taxonomy ID. Putmantime (talk) 18:46, 9 February 2016 (UTC)

Property Description Datatype Expected value

(if not listed, see property definition)

P225 taxon name String Species name? From NCBI, UniProt?
P??? strain name String Strain name From NCBI, UniProt, etc...
P171 parent taxon Item Bacterial species item e.g. Pseudomonas aeruginosa (Q31856)
P105 taxon rank Item Strain e.g. strain (Q855769)
P685 NCBI Taxonomy ID String 208964

What we are talking about is this:

Property Description Datatype Expected value

(if not listed, see property definition)

P??? strain name String Strain name From NCBI, UniProt, etc... e.g. Pseudomonas aeruginosa PAO1 (Q21065234)
P171 parent taxon Item Bacterial species item e.g. Pseudomonas aeruginosa (Q31856)
P105 taxon rank Item Strain e.g. strain (Q855769)
P685 NCBI Taxonomy ID String 208964
P??? UniProt ID String from UniProt, different from UniProt protein ID (P352)

- Brya (talk) 04:42, 10 February 2016 (UTC)

I agree. P225, P1420 and P1843 should not be taken form NCBI, UniProt? No items should be created on this basis. --Succu (talk) 06:51, 10 February 2016 (UTC) PS: I added UniProt protein ID (P352) and miss now something like UniProt name. --Succu (talk) 08:02, 10 February 2016 (UTC)
Not sure what you mean by "UniProt name". Is this something like "Pseudomonas aeruginosa (strain ATCC 15692 / PAO1 / 1C / PRS 101 / LMG 12228)", which to me does not look like a name but five names, for what may be (deemed to be) one strain. - Brya (talk) 11:39, 10 February 2016 (UTC)
Yes, the so called „Official (scientific) name“ used by UniProt mapped to NCBI Taxonomy ID (P685). --Succu (talk) 12:01, 10 February 2016 (UTC)
It is long list, and many names are regular scientific names. Could you point out a few examples? - Brya (talk) 12:07, 10 February 2016 (UTC)
  • 634452 ← Acetobacter pasteurianus (strain NBRC 3283 / LMG 1513 / CCTM 1153)
  • 4024 ← Acer saccharum
  • 441768 ← Acholeplasma laidlawii (strain PG-8A)
  • 237531 ← Actinomycete sp. (strain K97-0003)
  • 928294 ← Human adenovirus C serotype 1 (strain Adenoid 71)
  • 262698 ← Brucella abortus biovar 1 (strain 9-941)
  • 48984 ← Pantoea agglomerans pv. gypsophilae
  • 45222 ← Parana mammarenavirus (isolate Rat/Paraguay/12056/1965)
--Succu (talk) 12:23, 10 February 2016 (UTC)

But not all these names are unique to UniProt. For example, Acer saccharum is a regular botanical name, and Pantoea agglomerans pv. gypsophilae appears to be in fairly widespead use, as is Brucella abortus biovar 1 (strain 9-941). - Brya (talk) 17:32, 10 February 2016 (UTC)

My thought was that jjkoehorst want's to integrate these names somehow. If the speclist is important for the planned bots job I can provide some statistics. --Succu (talk) 18:36, 10 February 2016 (UTC)
Eventually I would like to create a most comprehensible but still useful taxonomy resource where people can easily search for organisms and their phenotypic characteristics. Also that when a new strain is sequenced its information can easily be integrated into WD according to a defined data model. However for this a solid ground needs to be established first and that is what I was thinking of. In general the primary identifier is the NCBI Taxonomic number. Which can be completed with information from NCBI scientific names and UniProt scientific / other names. If for obvious reasons this would introduce too many errors or is not according to the idea of how we should define a strain than this is perfectly fine to me. What was driving me from the beginning is that I want to connect phenotypic information from multiple resources to taxonomic identifiers and corresponding genetic makeup. I of course can do this on my own machine on my own little project and this would work out fine but no one else could benefit from this and thats why I started working on the idea of this phenobot (hence the name...).. In the discussion of the bot as mentioned by Succu I am expanding this idea further with possible phenotypic characteristics that I can get my hands on and could theoretically be integrated into WD but I am still writing on this User:Phenobot/Discussion. --jjkoehorst (talk) 21:04, 10 February 2016 (UTC)
As an example these are statements that would be interesting to add. Not all have properties and I am preparing for that.
Property Description Datatype Expected value
P1604 biosafety level Item Level 1 Q18396533

Level 2 Q18396535 Level 3 Q18396538 Level 4 ... see Q21079489

Property: P2043 length / size string 902320 bp Q21481789
P??? GC content float
P??? Gram staining item Gram positive Q857288

Gram negative Q632006

P??? Pathogenic to item Human, Plant, Animal, etc...
P??? Motility item Chemotactic (Chemotaxis) Q658145

Motile Q3359 Nonmotile (not yet found)

P??? Environment item or string soil, seawater, marine sediment, forest soil, etc...
P??? Temperature range item Hyperthermophile Q1784119

Mesophile Q669652 Psychrophile Q913343 Thermophile Q834023

Property: P2076 Temperature (optimal temperature) Q21079489

--jjkoehorst (talk) 09:11, 11 February 2016 (UTC)

If all that is to be included in an item, it becomes understandable that Succu would like a UniProt name, and (presumably?) a separate item for each such UniProt entity. - Brya (talk) 17:26, 11 February 2016 (UTC)
If I understand you correctly you mean to store the Biosafety/Gram/Temp/etc.. in a UniProt item? These are generic features from different sources (DSMZ/GOLD/etc) and are linked via the NCBI Taxonomy ID and in that case would not make sense to store these items under a uniprot name entry. --jjkoehorst (talk) 19:46, 11 February 2016 (UTC)

Back to the roots[edit]

Symbol oppose vote.svg Oppose: Back to the roots. „Code“ is protected. I see no reactions on error reports. The task is obscure. jjkoehorst, please rollback your bots contributions. --Succu (talk) 22:32, 11 February 2016 (UTC)

Code is unlocked and all revisions are drawn back. Please lets continue on what kind of shape would be acceptable for phenotypic information --jjkoehorst (talk) 06:51, 18 February 2016 (UTC)

I think there is great value in elements of what are proposed and it would make the microbial data on wikidata a much richer resource. Meta data such as Biosafetly level, gram -/+ etc.. would be very useful, but getting Taxonomy identifiers and names from UniProt may not be the best source. I think it would benefit this proposal to have a clear picture of what the scope of the project would be, and a clear definition of each bot task. Putmantime (talk) 23:16, 11 February 2016 (UTC)

Putmantime, mind to help? --Succu (talk) 23:21, 11 February 2016 (UTC)
Succu Yes definitely...can we keep the discussion going on this proposal? I think it has merit, but needs to be clearer. The naming issue for subspecies items seems to have thrown a wrench in things. I think NCBI is a good authority for strain names personally, because the name was submitted by the researcher that submitted sequence data to NCBI, and that is when the NCBI Taxonomy ID was generated as well as genome IDS. Not a scientific name though or consistently formatted. I view it as an appropriate label, and maybe a new 'strain name' property, but see it shouldn't be a taxon name. Any synonyms could be aliases, IMHO Putmantime (talk) 23:34, 11 February 2016 (UTC)
I am in the process of rolling back the changes made by the bot. I think the focus of the conservation has been shifted towards the naming issues which still exists and need to be discussed thoroughly. Currently existing names will not be modified by the bot and its main focus is on the metadata that is available at various resources through the NCBI taxonomic identifier which will not interfere with current information. I know that I initially started about the naming but the main focus is on the metadata. Hopefully we can keep the discussion going on the naming scheme and microbial metadata to come to a good agreement to improve the quality of information in Wikidata. --jjkoehorst (talk) 17:36, 12 February 2016 (UTC)
In the NCBI Taxonomy strains have no rank. We should find a consens that stating taxon rank (P105)=strain (Q855769) is OK. Otherwise we can use instance of (P31)=strain (Q855769) with taxon rank (P105)=novalue. --Succu (talk) 18:51, 12 February 2016 (UTC) E.g. Shigella flexneri 2a str. 301 (Q21102941), Putmantime. --Succu (talk) 22:13, 12 February 2016 (UTC)
There are similar cases elsewhere: "virus" as a subspecific entity is not regulated by a Code of nomenclature. This goes also for "forma specialis", "pathovar", etc. We should have a structure for this. - Brya (talk) 06:17, 13 February 2016 (UTC)
Yes we should. If I remember right f.sp. is used by IF and MycoBank as a rank. Strongly related to this bots task is the question of Candidatus (Q857968). --Succu (talk) 19:18, 13 February 2016 (UTC)
Yes, forma specialis is used by IF and MycoBank as a rank, but that does not make it a rank. And, yes, "Candidatus" is a similar problem case. - Brya (talk) 09:55, 14 February 2016 (UTC)

Dexbot[edit]

Dexbot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Ladsgroup (talkcontribslogs)

Task/s: Auto-transliterating for names of humans

Code: Based on pwb, probably publish it soon.

Function details: The codes analyses dumps of Wikidata and can create an auto-transliterating system for any given pair of languages based on that. I started with Persian and Hebrew (some edits for test [17] [18]) --Amir (talk) 18:14, 7 April 2015 (UTC)

  • Pictogram voting comment.svg Comment, please let me know when you try your system for some cyrillic language. I'd like to see it myself. --Infovarius (talk) 14:10, 8 April 2015 (UTC)
@Infovarius: I work in pair of languages like fa and he (which the bot adds Persian transliteration based on Hebrew and vice versa) which pair of language do you suggest? en and ru? Amir (talk) 11:54, 9 April 2015 (UTC)
Probably you should have stated this in your request. Your phrase "I started with" has encouraged me :) No, I don't suggest Russian as I understand the complexity of the task. --Infovarius (talk) 13:16, 10 April 2015 (UTC)
@Infovarius: I don't think Russian is too complicated to abandon. I took care of lots of different issues including country of citizenship, etc. so It's not hard for this bot. I asked you what language do think is the best pair for Russian *to start with* Amir (talk) 21:11, 10 April 2015 (UTC)
Will the bot be able to dedect delicate labels as in King An of Han (Q387311)? --Pasleim (talk) 19:24, 13 April 2015 (UTC)
It probably skips them or make a correct transliteration (depends on the language) but I can't say for sure. Let me test Amir (talk) 13:33, 15 April 2015 (UTC)
Are we ready for approval here?--Ymblanter (talk) 16:08, 15 April 2015 (UTC)
  • Just a caveat when when dealing with Chinese languages: Chinese to Latin script (and vice versa) transliterations are rarely standardized. For example, Alan Turing's given name might be transliterated into 艾伦 or 阿兰 (as in the case of Alan Moore (Q205739)) or 亚伦 (as in the case of Alan Arkin (Q108283)). These Chinese characters are roughly resembles "Alan" when pronounced, but due to regional differences (i.e. mainland China, Taiwan, Hong Kong, etc), they result in different transliterations. Even when two people's names are transliterated by the same region, they can be different. There is simply no standardization on this matter. —Wylve (talk) 14:53, 23 April 2015 (UTC)
    hmm, User:Wylve: Just a question: Is it wrong to put "亚伦" for Alan in Alan Turing? Amir (talk) 12:36, 25 April 2015 (UTC)
    It's not wrong, but it might not be the only way people call Alan Turing in Chinese. The lead sentence of Turing's article on zhwiki mentions that "Alan" is also transliterated as 阿兰. —Wylve (talk) 20:48, 25 April 2015 (UTC)
    @Wylve: I made 50 auto-transliterations [19], please check and say if anything is wrong or unusual. Thanks Amir (talk) 20:05, 16 May 2015 (UTC)
    I can't verify every name, since some of those people aren't mentioned in Chinese news sources. My standard of what is "wrong" or "unusual" is whether the transliterations you've produced are used predominantly in reliable and reputable sources. It is hard to judge sometimes, as there is a variety of transliterations used. For instance:
  • Jonathan Ross is transliterated as 强纳·森罗斯 and also 喬納森·羅斯
  • Leonard B. Jordan is also transliterated as 萊昂納德·B·喬丹
  • Jimmy Bennett is also transliterated as 吉米·本内特, 吉米班奈, 吉米班奈特.
  • Jason Lee is also named 杰森·李.
  • "Scott" from A. O. Scoot is also transliterated as 史考特.
All of your edits should be fine if read in Chinese, as they all sound like their English name. Also, I have found this page ([20]), which documents Xinhua News Agency (Q204839)'s official transliterations of names. These transliterations are considered official only in Mainland China. —Wylve (talk) 21:58, 16 May 2015 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Ladsgroup, Wylve: Does this look okay for an approval, or is there something we're missing? I don't speak (or read, for that matter) Chinese  Hazard SJ  05:40, 28 December 2015 (UTC)

  • Amir: Sounds cool. Regarding the he-fa pair
Tagging Amire80 and Eldad who may add some other advices. Eran (talk) 18:53, 4 January 2017 (UTC)
Well, last time people talked in this page was a year and half ago. I need to search to find the script and check. I'll do it soon Amir (talk) 19:12, 5 January 2017 (UTC)
  • @Ladsgroup: Only human names? How about geographical objects (populated places, rivers, etc.)? Right now I'm thinking to transliterate manually some batches of names of Ukrainian localities and to harvest them in WD; should I leave this task for your bot?:) --XXN, 14:49, 12 May 2017 (UTC)
    I don't think the AI would be good enough to do that for now, I'm planning to use w:LSTM in near future and in that case we might do some experiments soon. Amir (talk) 14:56, 12 May 2017 (UTC)

KunMilanoRobot[edit]

KunMilanoRobot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Kvardek du (talkcontribslogs)

Task/s:

  • Add french 'intercommunalités' on french communes items (example)
  • Add french communes population
  • Correct Insee codes of french communes

Code:

Function details: Takes the name of the 'communauté de communes' in the Insee base and adds it if necessary to the item, with point in time and source. Uses pywikipedia. --Kvardek du (talk) 19:27, 21 January 2014 (UTC)

Imo the point in time qualifier isn't valid here as the propriety isn't time specific. -- Bene* talk 15:10, 22 January 2014 (UTC)
Property:P585 says "time and date something took place, existed or a statement was true", and we only know the data was true at January 1st, due to numerous changes in French organization. Kvardek du (talk) 12:18, 24 January 2014 (UTC)
Interesting, some comments:
  • Not sure that "intercommunalités" are really aministrative divisions (they are built from the bottom rather than from the top). part of (P361) might be more appropriate than located in the administrative territorial entity (P131)
  • Populations are clearly needed but I think we should try do it well from the start and that is not easy. That seems to require a separate discussion.
  • INSEE code correction seems to be fine.
  • Ideally, the date qualifiers to be used for intercommunalité membership would be start time (P580) and end time (P582) but I can't find any usable file providing this for the whole country. --Zolo (talk) 06:37, 2 February 2014 (UTC)
Kvardek du : can you add « canton » and « pays » too ? (canton is a bit complicated since some cantons contains only fraction of communes)
Cdlt, VIGNERON (talk) 14:01, 4 February 2014 (UTC)
Wikipedia is not very precise about administrative divisions (w:fr:Administration territoriale). Where are the limits between part of (P361), located on terrain feature (P706) and located in the administrative territorial entity (P131) ?
Where is the appropriate place for a discussion about population ?
VIGNERON : I corrected Insee codes, except for the islands : the same problem exists on around 50 articles due to confusion between articles and communes on some Wikipedias (I think).
Kvardek du (talk) 22:26, 7 February 2014 (UTC)
@Bene*, Vogone, Legoktm, Ymblanter, The Anonymouse: Any 'crat to comment?--GZWDer (talk) 14:37, 25 February 2014 (UTC)
I'm still not familiar with the "point in time" qualifier. What about "start date" because you mentioned the system has changed to the beginning of this year? Otherwise it might be understood as "this is only true/happened on" some date. -- Bene* talk 21:04, 25 February 2014 (UTC)
Property retrieved (P813) is for the date the information was accessed and is used as part of a source reference. point in time (P585) is for something that happened at one instance. It is not appropriate for these entities which endure over a period of time. Use start time (P580) and end time (P582) if you know the start and end dates. Filceolaire (talk) 21:19, 25 March 2014 (UTC)

Symbol support vote.svg Support if the bot user uses start time (P580) and end time (P582) instead of point in time (P585) --Pasleim (talk) 16:48, 28 September 2014 (UTC)

@Kvardek du: Do you still plan to run the bot? If so, could you please do agian some test edits with the use of start time (P580), end time (P582) instead of point in time (P585)? --Pasleim (talk) 07:52, 24 May 2015 (UTC)
@Pasleim: : it's planned, but not for the moment... The problem I have with french data is that you only have the membreship at a moment t, and not with a start time (P580). Kvardek du (talk) 13:20, 25 May 2015 (UTC)
Kvardek du then use retrieved (P813) in the reference and leave out start time (P580) and point in time (P585). Joe Filceolaire (talk) 08:33, 23 July 2015 (UTC)
Filceolaire : yeah but I have a retrieved (P813) t2 which is different from my point in time (P585)... Kvardek du (talk) 15:47, 24 July 2015 (UTC)
If you don't know the 'start time' then leave it out. If you want then you can create a separate item for the document that the data comes from and add the point in time statement to that item then reference the item for that document in the references for the 'located in ... entity' statements. Look on it as the 'point in time' date relates to the info in the document (true on that date).
Note that population figures should have a 'point in time' qualifier to say when that population figure applies since the population figure is not true for a period; it is only true for the day it was measured. Joe Filceolaire (talk) 00:55, 25 July 2015 (UTC)

AviBot[edit]

AviBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Offthewoll (talkcontribslogs)

Task/s: Retrieve information about universities on Wikidata.

Code: Can provide upon request.

Function details: Retrieve information about universities on Wikidata. This bot is for reads only, no editing. Using a small Python script I've written to get a list of entities using the wdq API and then get information about each one using wbgetentities with the Wikidata API. --Offthewoll (talk) 21:29, 17 May 2016 (UTC)

@Offthewoll: Why do you think you need botflag for? Are you going to reach any limitations regarding API? Matěj Suchánek (talk) 06:44, 9 June 2017 (UTC)
Or, maybe https://query.wikidata.org/ (Manual) is fit for your needs? --XXN, 14:28, 23 June 2017 (UTC)
@Offthewoll: Is this request still needed? If so, please answer the questions raised. Mbch331 (talk) 20:37, 25 August 2017 (UTC)

mahirbot[edit]

mahirbot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Mdmahir (talkcontribslogs)

Task/s:

  1. To Update Descriptions for Tamil Films for both English and Tamil for almost 5000+ items.
  2. To Update Wikimedia category for Tamil Films (5000+ items)


Code: http://tools.wmflabs.org/wikidata-todo/quick_statements.php

Data source: wikidataquery


Function details:

  • Description(English): Tamil Film (2014)
  • Description(Tamil): தமிழ்த் திரைப்படம் (2014)

Note: 2014 is production year of the film

Because its 5000+ items, I prefer to use bot account with community consensus. Thanks --Mdmahir (talk) 04:22, 25 February 2016 (UTC)

Perhaps, you can apply for a flood flag (for the created bot account or your account) on WD:BN, as this is a one-time task using a mass editing tool. XXN, 17:53, 10 May 2017 (UTC)
Pinging @Emijrp: as he has an appoved bot for a similar task, filling better descriptions for film items. XXN, 18:16, 10 May 2017 (UTC)

My code is available if Mdmahir (talkcontribslogs) wants to use it. I can't add new languages myself by now. I am off for a few weeks. Emijrp (talk) 19:03, 13 May 2017 (UTC)

@Mdmahir: - Is there still need for this request and do you want to use the code by Emijrp? Mbch331 (talk) 20:34, 25 August 2017 (UTC)

SaamDataImportBot[edit]

SaamDataImportBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Rkbrasse (talkcontribslogs)

Task/s:Import Smithsonian American Art Museum related data, including video, publications, exhibitions ...

Code:in the works

Function details: First set of data imported will be about videos related to exhibitions and artists that we have published on youtube. We will branch out to general museum data, exhibitions data, objects data and publications data if not already present. I will be more specific once the function is more nailed down and tested.

Video data mapping[edit]

  • We need a new Entity Type called Online Video that will contain the following properties
  • Title maps to P1476
  • url maps to streaming media
  • thumbnail needs to map to an image property

--Rkbrasse (talk) 18:45, 20 April 2016 (UTC)

I get a feeling this is better suited for the Structured Data on Commons project. Mbch331 (talk) 20:30, 25 August 2017 (UTC)

welvon-bot[edit]

welvon-bot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Welvon-bot (talkcontribslogs)

Add properties to wikidata item by mining the text of the wikipeida articles that belongs to the item. Task/s:

Not implemented yet! Code:

1-Scanning the first or/and second paragraph in wikipedia article which usually defines the article. 2-The text scanned from the article is the input to the model which will analyse the text. 3-The model output should a the properties of the wikidata's item. 4- Using API the properties of the wikidata's item is updated. 5- restart from the step 1 Function details: --Welvon-bot (talk) 08:56, 1 May 2016 (UTC)

Who's going to operate the bot? The operator can't be the bot itself. Mbch331 (talk) 05:40, 24 August 2017 (UTC)