Shortcuts: WD:RFBOT, WD:BRFA, WD:RFP/BOT

Wikidata:Requests for permissions/Bot

From Wikidata
Jump to: navigation, search
Wikidata:Requests for permissions/Bot
To request a bot flag, or approval for a new task, in accordance with the bot approval process, please input your bot's name into the box below, followed by the task number if your bot is already approved for other tasks.


Old requests go to the archive.

Once consensus is obtained in favor of granting the botflag, please post requests at the bureaucrats' noticeboard.

Translate this header box!
Bot Name Request created Last editor Last edited
MsynBot 1 2017-06-20, 18:09:45 Jc3s5h 2017-06-21, 12:49:16
Mr.Ibrahembot 4 2017-06-16, 17:43:01 Ymblanter 2017-06-19, 19:20:38
MatSuBot 8 2017-06-15, 16:40:47 ArthurPSmith 2017-06-15, 19:14:44
AndreCostaWMSE-bot 3 2017-06-13, 19:22:59 Ymblanter 2017-06-15, 20:28:02
PoliticianBot 1 2017-06-09, 06:00:16 Matěj Suchánek 2017-06-20, 18:46:04
MexBot 2 2017-06-08, 03:00:53 MarcAbonce 2017-06-13, 00:50:21
PokestarFanBot 4 2017-06-07, 00:14:32 Ymblanter 2017-06-14, 18:28:54
APSbot 3 2017-06-06, 20:05:33 Lymantria 2017-06-12, 08:19:48
Catabot 2 2017-06-04, 14:58:20 Ymblanter 2017-06-12, 19:07:59
PokestarFanBot 3 2017-06-03, 17:03:08 Ymblanter 2017-06-14, 18:29:34
AliciaFagervingWMSE-bot 6 2017-06-02, 09:20:27 Ymblanter 2017-06-08, 10:20:26
AliciaFagervingWMSE-bot 5 2017-05-29, 13:54:57 Ymblanter 2017-06-01, 19:43:39
WhidouBot 2017-05-26, 14:00:37 Ymblanter 2017-06-09, 19:22:15
JneubertAutomated 2 2017-05-17, 10:53:15 Ymblanter 2017-05-24, 09:44:13
CC0-JS 2017-05-16, 08:15:22 Ymblanter 2017-05-24, 09:41:13
Pathwaybot 2017-05-02, 19:05:04 Ymblanter 2017-05-18, 19:53:37
Twofivesixbot 2017-05-14, 20:24:40 Ymblanter 2017-05-24, 09:39:18
PokestarFanBot 2 2017-05-09, 22:48:54 Ymblanter 2017-06-14, 18:30:05
Polish Monuments 2017-05-09, 13:59:54 Ymblanter 2017-05-14, 18:23:15
JneubertAutomated 2017-05-08, 14:32:54 Ymblanter 2017-05-14, 18:21:25
Catabot 2017-04-17, 22:35:35 Ymblanter 2017-05-01, 16:53:23
Mr.Ibrahembot 3 2017-04-13, 10:16:11 Ymblanter 2017-04-18, 06:30:44
BacDiveBot 2017-03-28, 17:17:15 Ymblanter 2017-04-21, 20:37:25
MatSuBot 7 2017-03-25, 17:45:58 Ymblanter 2017-06-09, 19:24:00
Emijrpbot 8 2017-03-25, 11:42:28 Matěj Suchánek 2017-06-09, 06:47:22
ZacheBot 2017-03-04, 23:29:38 Ymblanter 2017-03-14, 21:20:39
НСБот 2017-02-24, 12:12:11 Ymblanter 2017-03-03, 08:48:41
MechQuesterBot 1 2017-02-26, 22:31:51 XXN 2017-06-04, 23:06:27
YULbot 2017-02-21, 18:05:13 Ymblanter 2017-03-14, 21:12:25
JayWackerBot 2017-02-09, 17:26:47 JayWacker 2017-03-01, 18:18:50
YBot 2017-01-12, 16:43:19 Pasleim 2017-01-15, 19:26:39
EaasServiceBot 2017-01-10, 15:09:13 Ymblanter 2017-03-05, 00:00:16
DiscogsBot 2016-12-12, 11:32:55 Pasleim 2017-01-15, 19:52:08
DoctorBot 2016-11-27, 03:01:24 DoctorBud 2016-12-21, 00:51:03
WikiLovesESBot 2016-07-03, 10:25:13 Jura1 2016-08-26, 08:42:53
MatSuBot 6 2016-07-01, 19:12:23 Matěj Suchánek 2017-06-20, 18:49:13
1-Byte-Bot 2016-03-02, 15:23:09 1-Byte 2016-03-03, 08:58:36
Hkn-bot 2016-01-16, 18:52:00 Pasleim 2017-01-15, 20:02:36
RollBot 2016-01-14, 17:01:42 Ymblanter 2017-04-25, 19:25:50
Dexbot 11 2015-04-07, 18:15:00 Ladsgroup 2017-05-12, 14:56:50
KunMilanoRobot 2014-01-21, 19:27:44 Alphama 2016-06-21, 18:15:06
AviBot 2016-05-17, 21:29:54 Matěj Suchánek 2017-06-09, 06:44:54
fabot 2017-02-12, 19:51:38 Vogone 2017-02-12, 19:51:38
florentyna 2017-02-12, 19:50:21 Vogone 2017-02-12, 19:50:21
InteliBOT 2015-01-16, 20:14:11 Vogone 2017-02-12, 19:49:42
Mahirbot 2016-02-25, 04:22:59 Emijrp 2017-05-13, 19:04:03
pywikibot 2017-02-12, 19:47:50 Vogone 2017-02-12, 19:47:50
SaamDataImportBot 2016-04-20, 18:45:43 160.111.254.17 2016-05-02, 14:17:01
welvon-bot 2017-02-12, 19:47:23 Vogone 2017-02-12, 19:47:23

MsynBot 1[edit]

MsynBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: MisterSynergy (talkcontribslogs)

Task/s: tidy claims of properties with quantity datatype; this means in particular: corrections of units; removal of inapplicable bounds; occasionally correction of quantity values including source addition

Code: in PAWS; maybe later in BitBucket as well

Function details:

  • The functional goal is already described above
  • I use pywikibot with PAWS right now, all scripts are hosted there as well; I consider switching to a tool labs installation of pywikibot at some point later, but this might not happen soon; code would then be hosted at my BitBucket repository
  • I already used a couple of self-written scripts for some smaller correction jobs with my regular account. Example: removal of bounds. However, there are two jobs planned that include a five-figure number of edits each, so this is no longer a small correction job
  • The scripts have hard-coded lists of items and properties which they should touch; there is no automatic item retrieval via querying, and no permanent operation is intended (i.e. one-time job for now)
  • Correction of multiple claims are bundled in one edit, if possible (see example).

MisterSynergy (talk) 18:09, 20 June 2017 (UTC)

Symbol support vote.svg Support Looks good to me. Matěj Suchánek (talk) 18:42, 20 June 2017 (UTC)
Although you show one edit of your own as an example, could your bot perform a couple of test edits? Lymantria (talk) 20:49, 20 June 2017 (UTC)
I will do so soon and inform you with Ping on this page. After I handed in this RfP yesterday, I decided to try to get my Tool Labs pywikibot installation running, in order not to rely on PAWS. This needs some config, but thing look good already. —MisterSynergy (talk) 04:51, 21 June 2017 (UTC)
It seems to me bots should be described in a way that editors who are not conversant with whatever language the bot is written in can still understand what the bot is supposed to do. This description seems inadequate to me. Jc3s5h (talk) 22:00, 20 June 2017 (UTC)
I am willing to add more detail, but unfortunately I am not fully sure which part of the idea needs more explanation. Can you please ask more specifically? Thanks, MisterSynergy (talk) 04:51, 21 June 2017 (UTC)
tidy claims of properties with quantity datatype; this means in particular: corrections of units; removal of inapplicable bounds; occasionally correction of quantity values including source addition is not a good description? The stuff below are Function details. Matěj Suchánek (talk) 06:23, 21 June 2017 (UTC)

Okay, more background:

On quantity properties in general (Help:Data type#Quantity): quantity claims can have snaktypes novalue, somevalue and value just as all other claims. In case of a value snaktype, the value consists of up to four parts:

  1. amount, always a numerical value (mandatory)
  2. unit, either string '1' (means “no unit”, appears as 1 (Q199) sometimes) or the entity representation of a unit item, such as string 'http://www.wikidata.org/entity/Q11574' for unit second (Q11574); the unit part is mandatory as well, even for quantities which are unit-less
  3. upperBound and lowerBound, absolute numerical values, always (?) symmetrical around amount (i.e. 100±1 has upperBound=101 and lowerBound=99; 100±0 has upperBound=100 and lowerBound=100); this was mandatory in the past, but it is not any longer and we can store bound-less quantities where these fields are just not there; bounds express the uncertainty interval of the quantity

Since we cannot use snaktypes for the individual parts of a quantity, we need to signal the absense of quantity parts differently. Oddly, this is inconsistently solved right now: “no unit” is expressed by 'unit':'1', while “no bounds” (i.e. no uncertainty interval) is expressed by the absense of lowerBound and upperBound. Bounds and units with “somevalue” character should not happen and can be ignored here to my experience.

On the situation of units used in properties:

  • If an editor enters a quantity claim into an item via the web interface, the unit field is marked “optional”, although the unit isn’t really optional for many properties. If errors happen due to forgotten/ignored units, they cannot be resolved automatically.
  • However, editors occasionally select the wrong unit by mistake, e.g. second (Q636099) instead of second (Q11574), and this can in fact be fixed in certain constricted areas.
  • There have also been (automatic) data imports in which the unit was apparently just forgotten and the quantity appears unit-less, although it actually should have a unit. There was a discussion recently at WD:PC (Wikidata:Project chat#Unitless claims) about what to do with those cases, including case numbers per quantity property for a couple of cases. I found that a large fraction of items referred to there did indeed have proper sources provided in an external identifier, that can be crawled and evaluated automatically and that provide information about the missing unit. This can be fixed automatically as well in some cases.

On the situation of bounds:

  • Due to the changes of how we store bounds in quantity units, they are used quite inconsistently right now. For a while, there was automatically something like ±1 (or ±0?) added to quantity values. This uncertainty interval was simply derived from the precision of the amount given, and very annoying for the editors who did not ask for it. I believe that most of these bounds are in fact wrong, but there is unfortunately little we can do to correct them automatically. However, after bounds became optional, editors continued and used quantities without bounds to correctly express the non-existence of an uncertainty interval for the given claim. There are plenty of items which use both styles at the same time.
  • There are even many quantity properties which are in fact (more or less) completely unsuitable for bounds. Uncertainty expresses imperfect information, such as it is the case for physical quantities in particular and most quantities which have been measured/extrapolated/simulated/… in general. Quantity data type is used for properties which inherently do not have uncertainties, such as:
    • Elo rating (P1087): calculated with an algorithm, based on input which is completely known and in the past
    • ranking (P1352): used a lot for sports results
    • maximum capacity (P1083): used for sports venues; not a measured quantity, but merely a regulation by some authority
    • number of children (P1971): simply counted; if unclear (e.g. amount is 2 or 3), we’d rather provide claims with different values and sources than an amount of “2.5±0.5”; the same applies to many other “number of …” properties
  • I proposed a “no bounds” constraint at Template talk:Constraint#Requests: Integer value and No bounds recently, but unfortunately this was ignored until now. It would help a lot to improve use of bounds.

How I plan to work:

  • The current version of the script is at [1]; I am not sure whether it is visible to other users, one at least needs to log on to PAWS. However, I changed plans and set up my pywikibot installation at Tool Labs, which would require some changes to the script (separate input data from the actual script). I already have a BitBucket repo which will then hold the code of the python scripts. If you cannot see the code in PAWS right now but you want to look at it, I will provide it differently on request.
  • I start with manual queries to get an impression of unit or bounds use, usually per property. If I encounter a fixable situation, I create (python) lists of items to be worked on, such as inputdata = [ 'Q1', 'Q2', 'Q42' ]. The script also knows which quantity properties to work on and loops over all items and properties (or qualifiers) to correct data if necessary. Examples would be:
    • Remove bounds from a given property or qualifier only if they are ±0 on a predefined set of items
    • Replace a given unit by another one in a given property or qualifier on a predefined set of items
  • Although we do have a lot of quantity properties with problems (as outlined above) right now, I do not plan to work on a reasonable fraction of them. Most work will focus on sports-related properties, and if someone asks for different data/properties to work on I will provide help there as well of course. If necessary, I will show up on property talk pages to discuss my plans.

MisterSynergy (talk) 10:14, 21 June 2017 (UTC)

Symbol support vote.svg Support In view of the way the bot will be run on well-identified sets of data with well-understood errors, I support it. Jc3s5h (talk) 12:49, 21 June 2017 (UTC)

MatSuBot 8[edit]

MatSuBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Matěj Suchánek (talkcontribslogs)

Task: Change imported from (P143) references of inverse statements.

Code: Pywikibot

Function details: When statements are added to complete a symmetric or inverse pair, some users add imported from (P143) reference with the second item as its value. This is inconsistent with the purpose of the property, which is used to indicate a wiki where the claim was imported from. (As an edge case, if there was an inverse statement imported from the wiki item, eg. English Wikipedia (Q328), it wouldn't be clear whether the item or the wiki is meant.) corresponding Wikidata item (Q20651139) is the dedicated item for those cases. The bot is supposed to change all imported from (P143) references where the value is same as the main value of the statement to corresponding Wikidata item (Q20651139). (Example: Special:PermaLink/440044797#P26.)

Note: there is also inferred from (P3452), which duplicates the sourcing strategy of corresponding Wikidata item (Q20651139). If there's a concensus that this item should be replaced in favor of inferred from (P3452), this task can be adjusted to use this property. --Matěj Suchánek (talk) 16:40, 15 June 2017 (UTC)

I didn't know about corresponding Wikidata item (Q20651139). It seems a little confusing - following the link to that item would not be particularly helpful. I think inferred from (P3452) pointing to the item is preferable (though the description is reasonably clear). ArthurPSmith (talk) 19:13, 15 June 2017 (UTC)

PoliticianBot 1[edit]

PoliticianBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Knuthuehne (talkcontribslogs)

Task/s: Add legislative term qualifiers to German politicians.

Code: Github

Function details: We would like to add additional information to German politicians. They have the `P39` (is member of) property with the value of the corresponding local parliament already.

We want to add an additional qualifier `P2937` (legislative period) with a value of the corresponding current period. And example for this can be found on Ernst-Ulrich Alda

The list of the politicians we would like to edit has been derived from scraping the parliament websites and comparing them to wikidata items that have the property (`Member of Landtag <state here>`) already. --Knuthuehne (talk) 05:59, 9 June 2017 (UTC)

Can your bot please perform some test edits? Lymantria (talk) 10:00, 9 June 2017 (UTC)
I added the qualifier for the new (now 17th ) period for five politicians as an example yesterday: https://www.wikidata.org/wiki/Special:Contributions/PoliticianBot Not all of the politicians of the new Landtag exist in Wikidata yet but there are not too many that are missing and I would create those in another step and just let the bot run for the existing ones for now.Knuthuehne (talk) 07:31, 18 June 2017 (UTC)
Please create the user page for the bot's account ({{bot|Knuthuehne}}) and consider adding a reference to modified claims as well. Otherwise looks good to me, thanks for your data donation. Matěj Suchánek (talk) 18:45, 20 June 2017 (UTC)

MexBot 2[edit]

MexBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: MarcAbonce (talkcontribslogs)

Task/s: Add official population data for Mexican municipalities.

Code: https://gitlab.com/a01200356/MexBot/blob/master/poblaciones.py

Function details:
The script finds all Mexican municipalities with an INEGI municipality ID and gets all the official population data available from INEGI's (Mexican public institute that does the census) API.
It will either add or update this data, with INEGI as the source.
It will also add census as the method for the year ends in 0, when the census is made.
MarcAbonce (talk) 21:45, 8 June 2017 (UTC)

Symbol support vote.svg Support --PokestarFan • Drink some tea and talk with me • Stalk my edits 23:16, 8 June 2017 (UTC)
Pictogram voting comment.svg Comment: Under which license INEGI publishes population data? XXN, 14:41, 9 June 2017 (UTC)
Not explicitated but it is like a CC BY, see point f in section "Del libre uso de la información del INEGI" of Términos de uso. I don't think is compatible. --ValterVB (talk) 17:35, 9 June 2017 (UTC)
Indeed, it only requires attribution, which is precisely what my script intends to add. Why would it be incompatible? Most of this data has already been manually added by people and apparently a Wikipedia scraping script too, but it's mostly unsourced. --MarcAbonce (talk)
Here we use CC0, if data here need citation the data is incompatible with the license. --ValterVB (talk) 05:47, 11 June 2017 (UTC)
Can census data even be licensed, though? As far as I know, facts cannot be licensed anywhere. If this is the case, this license would only be enforceable with the statistical data they generate (which I'm not using) but it wouldn't be enforceable for a simple, "natural" fact such as a total population.
Also, as I mentioned, this data is already allowed in practice. Wikipedia importing bots have added census data into Wikidata by claiming Wikipedia as the source (which is also CC0 incompatible, by the way), but this data is not generated by Wikipedia, but rather taken from INEGI and imported without source.
So, unless you actually plan to delete all the unsourced and Wikipedia sourced Mexican population data from this site, the most reasonable thing to do would be to treat this data the way it has been treated so far, for the sake of consistency.
--MarcAbonce (talk)

Emijrpbot 8[edit]

Emijrpbot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Emijrp (talkcontribslogs)

Task/s:

Bot adds imported from (P143) references to Wikinews article (Q17633526) items. Particularly it adds references to instance of (P31) and language of work or name (P407) properties. See example

Code: no coded yet

Function details:

Bot uses sitelink to detect which language version of Wikinews hosts the article, and it adds the imported from (P143) reference. When there are more than one sitelink, it picks just one (the largest Wikinews), based on number of articles.

--Emijrp (talk) 11:42, 25 March 2017 (UTC)

For my opinion, see my comment in the previous request for permission. Matěj Suchánek (talk) 17:53, 25 March 2017 (UTC)
  • Pictogram voting comment.svg Comment It's good to add "imported from" as a "source" when importing data from Wikipedia (or Wikinews here), but I don't think it adds much in terms of references. To calculate ratios, one might as well ignore it. For P31, such ratios probably don't add much anyways.
    --- Jura 18:19, 25 March 2017 (UTC)
  • @Matěj Suchánek, Jura1:, are we ready for approval given that the previous one was withdrawn?--Ymblanter (talk) 16:04, 7 April 2017 (UTC)

ZacheBot[edit]

ZacheBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Zache (talkcontribslogs)

Task/s: Import data from pre-created CSV lists.

Code: based on Pywikibot (Q15169668), sample import scripts [12]

Function details:

--Zache (talk) 23:29, 4 March 2017 (UTC)

@Zache:, could you pls make a couple of test edits, I do not see any lakes in the contribution of the bot.--Ymblanter (talk) 21:20, 14 March 2017 (UTC)

НСБот[edit]

НСБот (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Nikola Smolenski (talkcontribslogs)

Task/s: Import Zapis database of Wikimedia Serbia.

Code: Code not yet written.

Function details: I need a bot to import data from the database of Zapis trees as a part of Zapis - Sacred Tree project of Wikimedia Serbia. This will essentially be data from the table at sr:Списак записа у Србији#Табела регистрованих записа though there are additional data (such as tree height and similar).

The first bot task should be to fix data about municipalities of Serbia, examples of manual edits: [13] and [14]. Then it should create items about cadastral municipalities of Serbia, then about the trees.

I have previously operated commons:User:NSBot and sr:Корисник:НСБот without any problems. --Nikola (talk) 12:08, 24 February 2017 (UTC)

@Nikola Smolenski:, please register the bot account and make some test edits.--Ymblanter (talk) 08:48, 3 March 2017 (UTC)

MechQuesterBot[edit]

MechQuesterBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: MechQuester (talkcontribslogs)

Task/s: Add description to a bunch of villages in China.

Code:

SPARQL Code: select ?item {?item wdt:P31 wd:Q13100073}

Function details: I am refiling this bot request. It is to add descriptions to the villages in China. --MechQuester (talk) 22:31, 26 February 2017 (UTC)

What descriptions? "village in China"? Or in which languages? Steak (talk) 16:17, 28 February 2017 (UTC)
just in Chinese + English. MechQuester (talk) 17:55, 28 February 2017 (UTC)
I am also adding "subdistrict of China" to a few entries. I ran 10 test edits if anyone wants to read. MechQuesterBot (talk) 18:05, 28 February 2017 (UTC)
  • BTW I started looking into the timezone stuff. There are queries at Wikidata:Database_reports/time_zones.
    --- Jura 19:02, 28 February 2017 (UTC)
    • I reviewed some of it and I don't think it shows big problems even if the DST question should be standardized.
      --- Jura 11:31, 5 March 2017 (UTC)

anyone gonna comment any furthur?03:13, 2 March 2017 (UTC)

  • There are 590,000 items for villages in China but only 250,000 different labels for villages in China. [15] That means there are many villages sharing the same name. Since the software enforces to have label+description to be unique, you will run into many conflicts. --Pasleim (talk) 12:42, 2 March 2017 (UTC)
    • I think these item could need some edits to make them useful to people who don't speak Chinese. Let him have a go. Eventually maybe a better description will emerge.
      --- Jura 11:31, 5 March 2017 (UTC)
Im not totally willing to make a go at it now that there are numerous conflicts. MechQuester (talk) 16:37, 5 March 2017 (UTC)
Well, maybe it could be limited by doing something like "village in <adm3>, <adm2>, <adm1>, PRC". As I'm not sure which administrative layers are relevant and stable for China, I can't really suggest more. For the US, this could read "village in Trump County, Wyoming, USA" or "neighborhood of the city of <..>, Trump County, Wyoming, USA". For conflicts that remain, one would need to find a manual improvement. Maybe there is a default way to disambiguate such names.
--- Jura 16:52, 5 March 2017 (UTC)
Perhaps there. It would require some form of looking at parent "located in administrative category" but i don't have the coding skill abandoned. Thus a new one is better.

How about a new proposal?

select ?item {?item wdt:P31 wd:Q61878}

Using this to add P421:Q6985. MechQuester (talk) 03:27, 10 March 2017 (UTC)

What about just P421 and Q6985 for every article? MechQuester (talk) 17:18, 10 March 2017 (UTC)

@MechQuester, Jura1, Pasleim:, are we ready for approval here?--Ymblanter (talk) 21:16, 14 March 2017 (UTC)

Different Request[edit]

Plan to use descriptioner to add "Beetle" or "Bettle in Cerambycidae family" to en:Category:Lamiinae MechQuester (talk) 03:31, 14 March 2017 (UTC)

Please file it as a separate request so that it could be properly discussed.--Ymblanter (talk) 21:16, 14 March 2017 (UTC)
What happened in items like this?
  • 08:57, 27 March 2017‎ MechQuesterBot (talk | contribs)‎ . . (32,776 bytes) (+76)‎ . . (‎Added [en] description: species of beetle, #quickstatements)
  • 21:40, 4 April 2017‎ MechQuesterBot (talk | contribs)‎ . . (32,798 bytes) (-11)‎ . . (‎Changed English description: species, #quickstatements)
  • 21:41, 4 April 2017‎ MechQuesterBot (talk | contribs)‎ . . (32,809 bytes) (+11)‎ . . (‎Changed English description: species of beetle, #quickstatements)
It's not ok.
And in general, I don't think that for any of those 2 tasks should be approved a quickstatements-based "bot". IMO there is necessary a real bot which will set multiple descriptions (and labels) at once, instead of tens-to-100+ edits for changing terms for each language in part. XXN, 23:06, 4 June 2017 (UTC)

YULbot[edit]

YULbot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: YULdigitalpreservation (talkcontribslogs)

Task/s:

  • YULbot has the task of creating new items for pieces of software that do not yet have items in Wikidata.
  • YULbot will also make statements about those newly-created software items.

Code: I haven't written this bot yet.

Function details:

This bot will set the English language label for these items and create statements using publisher (P123), ISBN-13 (P212), ISBN-10 (P957), place of publication (P291), publication date (P577). --YULdigitalpreservation (talk) 18:04, 21 February 2017 (UTC)

good to run a test with a few examples so we can see what you're planning! ArthurPSmith (talk) 20:46, 22 February 2017 (UTC)
Interesting. Where does the data come from? Emijrp (talk) 12:04, 25 February 2017 (UTC)
The data is coming from the pieces of software themselves. These are pieces of software that are in the Yale Library collection. We could also supplement with data from oldversions.com.YULdigitalpreservation (talk) 13:07, 28 February 2017 (UTC)
Please let us know when the bot is ready for approval.--Ymblanter (talk) 21:12, 14 March 2017 (UTC)

JayWackerBot[edit]

JayWackerBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: JayWacker (talkcontribslogs)

Task/s: This will be used to set and remove Quora topic identifiers. It will also update the matches as Quora topics are renamed or merged. We are manually vetting the 250,000 MixNMatch matches of Quora topic to Wikidata entity. This bot will not update other properties of the Wikidata entity.

Code:

Function details: I'm unsure how much detail is necessary

set_quora_identifier(wikidata_id, quora_relative_url)

remove_quora_identifier(wikidata_id, quora_relative_url)

--JayWacker (talk) 17:25, 9 February 2017 (UTC)

Could you please explain in more detail on which basis you will remove or update Quora topic ids? How will setting Quora topic ids be different from the current approach with Mix'n'Match? --Pasleim (talk) 13:32, 14 February 2017 (UTC)
@Pasleim: Mix'n'Match may be used more rapidly with a bot-flagged account. @JayWacker: You may have missed this question. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:02, 19 February 2017 (UTC)
@Pasleim: First, we are manually vetting the 255,000 matches that MixNMatch identified. As generous as @Pigsonthewing: has been, we are taking his time regularly going through the hundreds of thousands of matches outside of MixNMatch and then giving them to him to set. Additionally, Quora topic names change regularly and are merged together and this results in the URLs changing. This means that the Wikidata-Quora identifiers will be out of date (though still redirected to the correct place). We may also be creating Quora topics from Wikidata entities means we can set these identifiers directly. We can also resolve the constraint violations more efficiently. JayWacker (talk) 04:04, 21 February 2017 (UTC)
  • Symbol support vote.svg Support. While I'm happy to assist Quora as long as needed, it's right and proper - and welcome - that they should be able to contribute directly. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:58, 21 February 2017 (UTC)
I will approve the bot in a couple of days provided there have been no objections raised.--Ymblanter (talk) 20:19, 26 February 2017 (UTC)
Oops, sotty, should have noticed earlier. Please make a couple of test edits.--Ymblanter (talk) 21:53, 28 February 2017 (UTC)
We'll do a couple of test edits and I'll get back to you (this may be a few weeks to get to the top of the stack). JayWacker (talk) 18:18, 1 March 2017 (UTC)

YBot[edit]

YBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Superyetkin (talkcontribslogs)

Task/s: import data from Turkish Wikipedia

Code: The bot, currently active on trwiki, uses the Wikibot framework.

Function details: The code imports data (properties and identifiers) from trwiki, aiming to ease the path to Wikidata Phase 3 (to have items that store the data served on infoboxes) --Superyetkin (talk) 16:42, 12 January 2017 (UTC)

It would be good if you could check for constraint violations insteaf of just blindly copying data from trwiki. These violations are probably all caused by the bot. --Pasleim (talk) 19:26, 15 January 2017 (UTC)

EaasServiceBot[edit]

EaasServiceBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Sharmeelaashwin (talkcontribslogs)

Task/s: Bot which talks to EaaS(Emulation-as-Service) to store and retrieve the rendering software and OS for a file format. This helps in opening the files used in Digital Preservation

Code:

Function details: It contains following APIs:

  1. . This Bot contains an API to store the file formats in WikiData. This API will be called when the user decides to save this file format information in EaaS
  2. . This Bot contains an API to read the rendering software's information from WikiData to open the file formats in EaaS

--Sharmeelaashwin (talk) 15:08, 10 January 2017 (UTC)

Which statements do you plan to add? As far as I know there isn't yet a "rendering software" property. --Pasleim (talk) 19:40, 15 January 2017 (UTC)
I would like to add a new page which stores all these information(file format, rendering software and environment). When an user opens a file format with a particular software, we will store this information in Wikidata and when another user tries to open the same file, we will fetch data from Wikidata and open the file with the software name retrieved from Wikidata. I will also store the environment(OS and dependent softwares) information in Wikidata. --Sharmeelaashwin (talk) 11:08, 16 January 2017 (UTC)
  • There is readable file format (P1072), but I don't quite see how you'd store here which one gets used if several render the same format.
    --- Jura 06:22, 17 January 2017 (UTC)
  • How about creating example items manually? ChristianKl (talk) 07:20, 17 January 2017 (UTC)
How much data do you plan to add? ChristianKl (talk) 07:20, 17 January 2017 (UTC)
  • @Jura: "Readable file format" stores the list of file formats that can be opened in a software. I would like to do just the opposite i.e, if I have a file format, I would like to have a list of softwares that can open this file format and also the OS. This has the following advantages
    1. If a user tries to open a file is EaaS(Emulation as Service) application, then from the file format, EaaS can query Wikidata and get a list of softwares that can open the file requested by user.
    2. If any Wikidata user knows that a particular file format can be rendered by a software, then he/she can directly update it in Wikidata which is much easier when compared to updating it i@n PRONOM.
@ChristianKl : I will manually add example items and let you know. In the initial phase I am intending to add a major file formats like .doc, .jpeg, .ppt, .tx but the final goal is to store all the file formats to be stored in Wikidata. I plan to create a table in a Wikidata page and keep updating the same. -- Sharmeelaashwin (talk) 09:16, 18 January 2017 (UTC)
Please make some test edits.--Ymblanter (talk) 22:23, 27 January 2017 (UTC)
I am not really happy with this performance--Ymblanter (talk) 00:00, 5 March 2017 (UTC)

DiscogsBot[edit]

DiscogsBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Ocram89 (talkcontribslogs) and AndreaNocera (talkcontribslogs)

Task/s: Update wikidata entries using Discogs (Q504063) artists dump (just Complete and Correct data).

Code: The code will be, hopefully, uploaded on github in a couple of days.

Function details: The Bot uses a filtered XML data dump about artists from Discogs (Q504063), the data used is the one that have the <data_quality> element as "Complete and Correct". Once the data is parsed, the bot check if there already exists an entity, this step is done through a SPARQL query which get all the musician (Q639669) or musical ensemble (Q2088357) with the name (or alias, or name variations) got from the XML dump. If the entity already exists, then new statement can be inserted (e.g. if a band does not have its members, this can be inserted using member of (P463) in the entity), if the entity does not exists a new item is created. If there are more entity with the same name, nothing is changed, to avoid involuntarily wrong statement. --DiscogsBot (talk) 11:32, 12 December 2016 (UTC)

Could you do a few test edits? Which statements, labels and descriptions will you add to a new created item? --Pasleim (talk) 13:09, 12 December 2016 (UTC)
We are doing tests on test.wikidata. The new item will have label, description and aliases and it will have statements Discogs artist ID (P1953) and if it's a band all the members or if it's a member of a group the name of the group. We are also trying to analyze the profile to get some other data like instruments, occupation etc. AndreaNocera (talk) 13:26, 14 December 2016 (UTC)
The edits done on test.wikidata.org look good. But I would still prefer if you could do around 100 edits here on Wikidata to see if you can dedect reliably already existing artist items. --Pasleim (talk) 19:51, 15 January 2017 (UTC)

DoctorBot[edit]

DoctorBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: DoctorBud (talkcontribslogs)

Task/s: Import ZFIN gene information and create (or augment) a corresponding Item in Wikidata

Code: Experimental and not yet public

Function details:

  • Import TSV data from http://zfin.org/downloads/gene.txt
  • Extract two columns of that data, one which will identify an Item (a Gene), the other a property of that Gene
  • Create an Item in Wikidata for the Gene
  • Create a Statement in Wikidata that binds the property to the Gene

--DoctorBot (talk) 03:00, 27 November 2016 (UTC)

@DoctorBot: The bot owner must use a different account from the bot itself.--Jasper Deng (talk) 03:00, 28 November 2016 (UTC)

DoctorBud DoctorBud (talkcontribslogs) is now declared as the Operator of DoctorBot in this Request.

Could you please make some test edits?--Ymblanter (talk) 16:03, 8 December 2016 (UTC)
@DoctorBud, DoctorBot: Are you still interested in this request?--Jasper Deng (talk) 08:44, 20 December 2016 (UTC)
@Jasper Deng: Yes, I'm still working on DoctorBot's code, but my request for DoctorBot being a Bot operated by DoctorBud is still important, if that's what you are asking. Thanks. --DoctorBud (talk) 00:51, 21 December 2016 (UTC)

WikiLovesESBot[edit]

WikiLovesESBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Discasto (talkcontribslogs)

Task/s: Miscellaneous tasks associated to photo upload campaigns promoted by WM-ES:

  • Assignment of commons categories to items handled in the campaigns (for example Wiki Loves Earth, Wiki Loves Folk, Wiki Loves Monuments, Photographs from Spanish Municipalities without pictures, and the like.
  • Sourcing of statements for items handled in the campaigns...

Code: Global repository is in here. Bot code is here.

Function details: The bot takes as input a series of lists (so called annexes in the Spanish Wikipedia, see example here and extracts necessary information: mainly wikidata item and commons category. If found, the bot does as follows:

  • Look up the Wikidata item.
  • Determines whether "Municipality of Spain" statement is available in P31 claim. If not, it creates the statement. If available, the statement is sourced to Spanish Wikipedia.
  • If the source (the list in the Spanish Wikipedia) provides a category, the bot determines whether a claim for Commons-category is available. If not, it creates the claim. If available, the claim is sourced to Spanish Wikipedia.
  • Finally, a commons sitelink for the category provided in the source is inserted if not available. If a gallery was already provided as commons sitelink, it's not modified.
  • Inconsistencies are logged during the process.

--Discasto (talk) 10:25, 3 July 2016 (UTC)

Symbol support vote.svg Support I strongly support this request. --Rodelar (talk) 22:04, 3 July 2016 (UTC)
Symbol support vote.svg Support I also support. --Harpagornis (talk) 15:00, 4 July 2016 (UTC)
Symbol support vote.svg Support I support this request. Ivanhercaz (talk) 16:17, 4 July 2016 (UTC)
Symbol support vote.svg Support I support this request. --Bauglir (talk) 16:28, 4 July 2016 (UTC)
Symbol support vote.svg Support I support this request. --ElBute (talk) 16:47, 4 July 2016 (UTC).
Symbol support vote.svg Support I support this request.--Pedro J Pacheco (talk) 20:14, 4 July 2016 (UTC)
Symbol support vote.svg Support The bot operator is reliable and knows what he does Poco2 21:07, 4 July 2016 (UTC)
Symbol support vote.svg Support I support this request. The operator has done a good work with other bots in different projects. --Millars (talk) 15:47, 5 July 2016 (UTC)
Symbol support vote.svg Support I support this request. --Dorieo (talk) 17:41, 6 July 2016 (UTC)
Sorry, Jura, I missed your comment. I have to say that I don't fully understand your comment (mainly that part related to the amount of municipalities mismatch). With regard the second part, I will patch the code to consider also subclasses. However, parroquia (Q3333265) does not apply, as a parroquia is a subdivision of a municipality. The lists we're handling have been reviewed several times by the WM-ES members and all the items are actually municipalities. Smaller subdivisions can be considered in next editions, but not now. Therefore, my only concern relates to the subclasses (I didn't actually consider that possibility). Best regards --Discasto (talk) 22:02, 22 July 2016 (UTC)
And I didn't notice your answer. There are several possible reasons for the mismatch in the number of municipalities: we could have already an item for the municipality, but it just isn't linked to eswiki. The easiest way to solve this would be to add the statements and then check the result for duplicates (it could also be done in advance, but this may be more complicated).
As far as "concejo of Asturias" is concerned, you could add both or replace it. Whatever suits interested editors best.
The "parroquia" question seems minor (11 items currently): If you look at the query result you will notice that some items have this in P31 in addition. This can mean that the article in some other wiki is about the parroquia or there is some other mixup. These items may need to be split.
--- Jura 08:42, 17 August 2016 (UTC)


  • It'll be great if some active editors of Wikidata could give their opinions. Canvassing of users with a low amount of contributions doesn't help. Sjoerd de Bruin (talk) 14:55, 7 July 2016 (UTC)
I took a look at contribs - it looks like a lot of entries have already been made, but the bot was blocked as unapproved. From my review of the entries made the bot seems to be operating reasonably. However, adding a reference of "imported from xx wikipedia" is barely better than no source at all, I'm not sure this is really helpful. If there's an actual es.wikipedia.org page that is the source of the information, providing that via "reference URL" and "retrieved on" properties would be more useful. An external source for this data would be much better. ArthurPSmith (talk) 14:42, 8 July 2016 (UTC)
I have no strong opinion on this. I do agree on providing an external source if available. It's not the case in most of the situations we're handling. Therefore, I'll simply skip this step. In fact, the core functionality (which I'm currently doing by hand) was related to setting commons categories. As we're handling all the items in the list, it seemed sensible to add sources. If you feel it's useless (unless a proper source is provided), I'll skip this step. Thanks for providing feedback --Discasto (talk) 22:45, 12 July 2016 (UTC) PS: yes, it's been blocked in the middle of a task that nowadays I have to do by hand. I don't really understand this block. Seems to me the typical bureaucratic behaviour that harms more than helps
I am going to approve the bot tomorrow provided there have been no objections.--Ymblanter (talk) 09:46, 13 July 2016 (UTC)
It would be good to have an answer to my question. We don't want to end up with even more duplicates.
--- Jura 12:34, 15 July 2016 (UTC)
@Ymblanter, Discasto: Please see my comment above.
--- Jura 08:43, 17 August 2016 (UTC)
@Jura1: I saw it weeks ago (and I answered :-), see answer on 22 July... I assumed you had this page in your watch list) --Discasto (talk) 08:52, 17 August 2016 (UTC)
@Discasto: Well, generally I notice, but here I missed it. Bot requests aren't exactly my preferred stuff ; ). Did you notice my comment from today?
--- Jura 08:54, 17 August 2016 (UTC)

Pictogram voting comment.svg Comment I drop this request. However, may I ask the account to be unblocked? It will not be active, but being blocked sincerely mean an overkill. Best regards --Discasto (talk) 21:50, 23 August 2016 (UTC)

@Jura1:, @Discasto:: The task seems useful, is there any chance you can agree and proceed with the task?--Ymblanter (talk) 07:59, 24 August 2016 (UTC)
I think it's essentially a question of checking the result. This could be done after addition. A way to flag former municipalities needs to be determined (by end date and/or with some former municipality (Q19730508) item). In the meantime, Abián is working with Spanish municipalities (Wikidata:Bot_requests#Mayors_of_Spain).
--- Jura 08:42, 26 August 2016 (UTC)

MatSuBot 6[edit]

MatSuBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Matěj Suchánek (talkcontribslogs)

Task: Convert HTML entities in terms and maybe statements to regular text.

Code: Not yet decided on the implementation.

Function details: The biggest problem is at the moment querying for items which have such errors (if I don't find any other possibilty, I will try to combine SQL and PWB). --Matěj Suchánek (talk) 19:12, 1 July 2016 (UTC)

Please make some test edits.--Ymblanter (talk) 14:47, 5 July 2016 (UTC)
For your information, I put this Time2wait.svg On hold since I am not able to query for the items. I hope to find a solution in the near future. Matěj Suchánek (talk) 18:49, 20 June 2017 (UTC)

1-Byte-Bot[edit]

1-Byte-Bot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: 1-Byte (talkcontribslogs)

Task/s: Import census data from the Turkish Statistical Institute.

Code: Based on pywikibot

Function details:

--1-Byte (talk) 15:22, 2 March 2016 (UTC)

Update: Currently on hold as it's not entirely clear how to cite the data. --1-Byte (talk) 08:58, 3 March 2016 (UTC)

Phenobot[edit]

Phenobot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Jjkoehorst (talkcontribslogs)

Task/s: The first step will be to improve the lineage annotation of organisms including taxon identifiers, correct species names and corresponding references using the UniProt Taxonomy database. The next step will be to include missing organisms into Wikidata and phenotypic information such as biosafety level, oxygen requirements and other features. Continuous discussion can be found here User:Phenobot/Discussion

Code:https://bitbucket.org/jjkoehorst/wikidatabots

Function details:This bot is based upon the basis of the ProteinBoxBot framework. It will use the UniProt Taxonomy SPARQL end point for data extraction and initially will work on completing existing entries as much as possible with correct names and taxon identifiers and missing species will be added to WD. For strains with existing phenotypic information this can be complemented from various sources which are currently under investigation such as GOLD or DSMZ. --jjkoehorst (talk) 15:13, 4 February 2016 (UTC) Abbe98
Achim Raschka (talk)
Brya (talk)
Dan Koehl (talk)
Daniel Mietchen (talk)
Delusion23 (talk)
Faendalimas
FelixReimann (talk)
Infovarius (talk)
Joel Sachs
Josve05a (talk)
Klortho (talk)
Lymantria (talk)
Michael Goodyear
MPF
PhiLiP
Andy Mabbett (talk)
Prot D
pvmoutside
Rod Page
Soulkeeper (talk)
Tinm
Tommy Kronkvist (talk)
TomT0m
Pictogram voting comment.svg Notified participants of WikiProject Taxonomy

@Succu: Can you have a look at this request? --Pasleim (talk) 10:32, 5 February 2016 (UTC)
I have some problems with the task "correct species names" NCBI is not a nomencatural database. It contains spelling errors like other databases too. And I have problems with this kind of sourcing. The NCBI-ID is allready referenced, nothing is imported from UniProt. The Disclaimer tells us „The NCBI taxonomy database is not an authoritative source for nomenclature or classification - please consult the relevant scientific literature for the most reliable information.“ --Succu (talk) 11:40, 5 February 2016 (UTC)
Here the Bot removed taxon name (P225). --Succu (talk) 12:10, 5 February 2016 (UTC) PS: Pseudomonas putida 10-23 (Q22661287) P225 is missing. --Succu (talk) 07:24, 6 February 2016 (UTC)
I agree with Succu. Why go change species names, based on UniProt? Could do serious damage. And indeed that kind of sourcing is unwanted and adds nothing: database is slow enough as it is. - Brya (talk) 11:58, 5 February 2016 (UTC)
This proposal does not seem to be mature. The Uniprot taxonomy database is a customized version of the NCBI taxonomy database, which itself is not reliable for taxonomy anyway. It is currently not clear if the bot owner knows enough about taxonomy and nomenclature to understand the issues associated with Wikidata taxon items. Also the proposed use of imported from (P143) does not seem appropriate.
Nevertheless my understanding is that many of this bot's contributions would be made in microbiology, and the issues would be a little different if its contributions were limited to this area. Otherwise I see no reasons to prevent the bot from adding “biosafety levels, oxygen requirements and other [such] features”.
Tinm (d) 18:29, 5 February 2016 (UTC)
Yes the main basis of this bot will be within microbiology and I can restrict the bot to remain within prokaryotes. About the naming, what I am currently doing is to leave the name alone if it exists in UniProt taxonomy as either other name or scientific name. But I can leave the name as it is as I am mostly relying on the taxonomic identifier from the NCBI/UniProt. My main priority is to have the NCBI Taxonomy identifier correct / filled in so that I can include he phenotypic characteristics and also easily can verify wether an organism page has been created and if not create as such. I can also skip adding references if one is already available. --jjkoehorst (talk) 06:45, 6 February 2016 (UTC)
Yes, this taxon name is pretty bad. And again, the fact that the rank is that of species does not need a reference (this is so by definition), and as there is a link to NCBI, the fact that the taxon name is accepted by NCBI does not need to be repeated in the form of a reference to taxon name. - Brya (talk) 07:44, 6 February 2016 (UTC) -also beyond understanding - Brya (talk) 07:51, 6 February 2016 (UTC) - And "instance of taxon" means that "taxon name" is present in the item. UniProt cannot know anything about that, so adding a reference to "instance of taxon" is pure misrepresentation. - Brya (talk) 07:58, 6 February 2016 (UTC)
Sorry about those naming, ill restrain the bot then to only prokaryotes if you prefer and to only update missing naming and NCBI Taxonomy information. When that works out good i'll make some property requests for the phenotypic information as stated earlier, ok? --jjkoehorst (talk) 09:31, 6 February 2016 (UTC)
If that means 1) only missing names of prokaryotes and 2) sourcing only for NCBI Taxonomy information, then yes, OK. - Brya (talk) 13:00, 6 February 2016 (UTC)
Looks like the databases are out of sync. NCBI Taxonomy ID (P685)=208964 gives Pseudomonas aeruginosa PAO1 (www.ncbi.nlm.nih.gov/taxonomy) and Pseudomonas aeruginosa (strain ATCC 15692 / PAO1 / 1C / PRS 101 / LMG 12228) (www.uniprot.org/taxonomy). This explains „adjustments“ like this one. --Succu (talk) 11:05, 6 February 2016 (UTC)
Looks like UniProt provides five separate names, rolled up into one entry? - Brya (talk) 13:00, 6 February 2016 (UTC)
There is a mapping between NCBI Taxonomy ID (P685) and a so called „Official (scientific) name“ used by UniProt. So maybe we need a qualifier for P685 to indicate this name. --Succu (talk) 16:44, 6 February 2016 (UTC)
Yes I had an email conversation with uniprot and this was a reply about that case: The idea is not to use a concise name. A same strain may be known by different names because it has been deposited in different organizations (institutions, private companies, etc) with different names. So we try to track these co-identical strain names used by the major concerned organizations for a specific strain. This name is stored as scientifcName and all the variances are stored among other names. --jjkoehorst (talk) 19:30, 6 February 2016 (UTC)
So what's your conclusion? BTW: I stumbled over User:Phenobot/Discussion, which looks like an outline of the intended bot task, but not mentioned here. --Succu (talk) 20:28, 6 February 2016 (UTC)
Well one way it makes sense to use a general nomenclature which encapsulates all possible extra namings but it is not the true scientific name. Maybe a taxon synonym name entry could be used which lists other names belonging to this organism.Yes the discussion page is to discuss the roadmap after the general taxon identification and naming is completed sorry that I did not mention it here but it was not completed yet to my opinion but feel free to comment on it if you like... --jjkoehorst (talk) 08:02, 7 February 2016 (UTC)
Strictly speaking these are not scientific names at all. The ICNP does not cover names at a rank lower than subspecies. AFAIK there is no formal system for naming strains, so this may well happen on an ad hoc basis, or according to a local standard. In fact, it would help somewhat not to put these in "taxon name". - Brya (talk) 08:29, 7 February 2016 (UTC)
Then I would suggest that the names currently in WD should correspond to the NCBI nomenclature or to any of the Uniprot (scientificnames/othernames) if this is not the case then it should be either the scientific name from the NCBI or from UniProt if there is no reference available. What do you think? And where would you place the other names? As a common name or something else? --jjkoehorst (talk) 08:40, 7 February 2016 (UTC)
? The names in NCBI/Uniprot are not scientific names (not regulated by a Code of nomenclature). The most obvious way to handle strains would be to have a property "strain name" (perhaps to be combined with "parent taxon", etc). - Brya (talk) 09:33, 7 February 2016 (UTC)
My consideration are the same. --Succu (talk) 10:10, 7 February 2016 (UTC)
I agree a strain property should then be created which specifies the name of a strain? However taxon name then becomes obsolete for strains at least if I am correct. The elements that are obligatory for strains are then parent taxon, taxon rank, NCBI Taxonomy ID, general labels and instance of. Anything that else that can be used with the current properties? --jjkoehorst (talk) 11:49, 7 February 2016 (UTC)
Yes, this new property should be used instead of P225. This would reduce "Format" violations of P225 too. --Succu (talk) 12:54, 7 February 2016 (UTC)
Sounds good, who is going to propose for a new property for taxon name and can this taxon name then also contain multiple values, such as synonyms of the strain name or should another property be made for that? --jjkoehorst (talk) 14:47, 7 February 2016 (UTC)
I think we need a second property UniProt name to modell the relationship to the NCBI id. In case of strains we could use aliasses to add the name variants. You can propose them at Wikidata:Property proposal/Natural science. --Succu (talk) 18:49, 7 February 2016 (UTC)
A property "UniProt" to link to the UniProt-entries may be handy. Not sure what else you mean, as UniProt-entries may concern regular taxa as well as strains and whatever else UniProt includes. - Brya (talk) 06:40, 8 February 2016 (UTC)
I am not much in favour of multiple names in one item, and including out-of-use names beside the current name seems like a recipe for disaster. But we really do need a separate property "taxon synonym (string)" beside the present "taxon synonym [item]". - Brya (talk) 15:53, 7 February 2016 (UTC)
Yes we should request for a taxon synonym string variant. Then by default it would be the scientific name of the NCBI nomenclature if no better name is available? --jjkoehorst (talk) 19:50, 7 February 2016 (UTC)
Synonyms are an area full of hidden dangers. What we may really need are:
  • "taxon synonym, homotypic (item)"
  • "taxon synonym, heterotypic (item)"
  • "taxon synonym, homotypic (string)"
  • "taxon synonym, heterotypic (string)"
Especially heterotypic synonyms may vary strongly, depending on point of view (references!). Brya (talk) 06:40, 8 February 2016 (UTC)
I looked into: taxon common name (P1843) which is a common name for a given taxon. As basis we could use the NCBI nomenclature for strains (and/or others?). And over time add the homotypic/heterotypic naming. Shall I run a test with the restricted settings I have now? Only bacteria, no name updating if there is a name available and no reference adding if the value is already present? --jjkoehorst (talk) 08:01, 8 February 2016 (UTC)
@Brya: Regarding how to handle synonyms, I have thought of a way of doing things that would solve a very big part of the issues we encounter with the current one. I'm going to make a post about that on the project talk page when I'll have a bit of time. It would imply significant changes but I really believe it would answer many issues efficiently. Anyway, I guess you will see when I put it up. —Tinm (d) 02:34, 9 February 2016 (UTC)
I will be most interested to see what you come up with. - Brya (talk) 06:13, 9 February 2016 (UTC)

Greetings all. I am part of the GeneWiki team and I am adding genes and proteins for bacteria under our MicrobeBot (talkcontribslogs) account. see: MicrobeBot Task Page For my project it is important that there remain distinct strain items with NCBI taxonomy identifiers so I can link genes and proteins to them via found in taxon (P703). Just a thought, but we could distill some of the views here in a mockup of a Wikidata strain item in this table below? Using Pseudomonas aeruginosa PAO1 (Q21065234) as an example. I added some of the basics that are there for strain items now. I personally think a new 'NCBI strain name' type of property would be a good thing to have as these strain names are directly linked to the NCBI Taxonomy ID. Putmantime (talk) 18:46, 9 February 2016 (UTC)

Property Description Datatype Expected value

(if not listed, see property definition)

P225 taxon name String Species name? From NCBI, UniProt?
P??? strain name String Strain name From NCBI, UniProt, etc...
P171 parent taxon Item Bacterial species item e.g. Pseudomonas aeruginosa (Q31856)
P105 taxon rank Item Strain e.g. strain (Q855769)
P685 NCBI Taxonomy ID String 208964

What we are talking about is this:

Property Description Datatype Expected value

(if not listed, see property definition)

P??? strain name String Strain name From NCBI, UniProt, etc... e.g. Pseudomonas aeruginosa PAO1 (Q21065234)
P171 parent taxon Item Bacterial species item e.g. Pseudomonas aeruginosa (Q31856)
P105 taxon rank Item Strain e.g. strain (Q855769)
P685 NCBI Taxonomy ID String 208964
P??? UniProt ID String from UniProt, different from UniProt ID (P352)

- Brya (talk) 04:42, 10 February 2016 (UTC)

I agree. P225, P1420 and P1843 should not be taken form NCBI, UniProt? No items should be created on this basis. --Succu (talk) 06:51, 10 February 2016 (UTC) PS: I added UniProt ID (P352) and miss now something like UniProt name. --Succu (talk) 08:02, 10 February 2016 (UTC)
Not sure what you mean by "UniProt name". Is this something like "Pseudomonas aeruginosa (strain ATCC 15692 / PAO1 / 1C / PRS 101 / LMG 12228)", which to me does not look like a name but five names, for what may be (deemed to be) one strain. - Brya (talk) 11:39, 10 February 2016 (UTC)
Yes, the so called „Official (scientific) name“ used by UniProt mapped to NCBI Taxonomy ID (P685). --Succu (talk) 12:01, 10 February 2016 (UTC)
It is long list, and many names are regular scientific names. Could you point out a few examples? - Brya (talk) 12:07, 10 February 2016 (UTC)
  • 634452 ← Acetobacter pasteurianus (strain NBRC 3283 / LMG 1513 / CCTM 1153)
  • 4024 ← Acer saccharum
  • 441768 ← Acholeplasma laidlawii (strain PG-8A)
  • 237531 ← Actinomycete sp. (strain K97-0003)
  • 928294 ← Human adenovirus C serotype 1 (strain Adenoid 71)
  • 262698 ← Brucella abortus biovar 1 (strain 9-941)
  • 48984 ← Pantoea agglomerans pv. gypsophilae
  • 45222 ← Parana mammarenavirus (isolate Rat/Paraguay/12056/1965)
--Succu (talk) 12:23, 10 February 2016 (UTC)

But not all these names are unique to UniProt. For example, Acer saccharum is a regular botanical name, and Pantoea agglomerans pv. gypsophilae appears to be in fairly widespead use, as is Brucella abortus biovar 1 (strain 9-941). - Brya (talk) 17:32, 10 February 2016 (UTC)

My thought was that jjkoehorst want's to integrate these names somehow. If the speclist is important for the planned bots job I can provide some statistics. --Succu (talk) 18:36, 10 February 2016 (UTC)
Eventually I would like to create a most comprehensible but still useful taxonomy resource where people can easily search for organisms and their phenotypic characteristics. Also that when a new strain is sequenced its information can easily be integrated into WD according to a defined data model. However for this a solid ground needs to be established first and that is what I was thinking of. In general the primary identifier is the NCBI Taxonomic number. Which can be completed with information from NCBI scientific names and UniProt scientific / other names. If for obvious reasons this would introduce too many errors or is not according to the idea of how we should define a strain than this is perfectly fine to me. What was driving me from the beginning is that I want to connect phenotypic information from multiple resources to taxonomic identifiers and corresponding genetic makeup. I of course can do this on my own machine on my own little project and this would work out fine but no one else could benefit from this and thats why I started working on the idea of this phenobot (hence the name...).. In the discussion of the bot as mentioned by Succu I am expanding this idea further with possible phenotypic characteristics that I can get my hands on and could theoretically be integrated into WD but I am still writing on this User:Phenobot/Discussion. --jjkoehorst (talk) 21:04, 10 February 2016 (UTC)
As an example these are statements that would be interesting to add. Not all have properties and I am preparing for that.
Property Description Datatype Expected value
P1604 biosafety level Item Level 1 biosafety level 1 (Q18396533)

Level 2 biosafety level 2 (Q18396535) Level 3 biosafety level 3 (Q18396538) Level 4 ... see Pseudomonas putida KT2440 (Q21079489)

Property: P2043 length / size string 902320 bp base pairing (Q21481789)
P??? GC content float
P??? Gram staining item Gram positive Gram-positive (Q857288)

Gram negative Gram-negative (Q632006)

P??? Pathogenic to item Human, Plant, Animal, etc...
P??? Motility item Chemotactic (Chemotaxis) chemotaxis (Q658145)

Motile Triton (Q3359) Nonmotile (not yet found)

P??? Environment item or string soil, seawater, marine sediment, forest soil, etc...
P??? Temperature range item Hyperthermophile Hyperthermophile (Q1784119)

Mesophile Mesophile (Q669652) Psychrophile Psychrophile (Q913343) Thermophile Thermophile (Q834023)

Property: P2076 Temperature (optimal temperature) Pseudomonas putida KT2440 (Q21079489)

--jjkoehorst (talk) 09:11, 11 February 2016 (UTC)

If all that is to be included in an item, it becomes understandable that Succu would like a UniProt name, and (presumably?) a separate item for each such UniProt entity. - Brya (talk) 17:26, 11 February 2016 (UTC)
If I understand you correctly you mean to store the Biosafety/Gram/Temp/etc.. in a UniProt item? These are generic features from different sources (DSMZ/GOLD/etc) and are linked via the NCBI Taxonomy ID and in that case would not make sense to store these items under a uniprot name entry. --jjkoehorst (talk) 19:46, 11 February 2016 (UTC)

Back to the roots[edit]

Symbol oppose vote.svg Oppose: Back to the roots. „Code“ is protected. I see no reactions on error reports. The task is obscure. jjkoehorst, please rollback your bots contributions. --Succu (talk) 22:32, 11 February 2016 (UTC)

Code is unlocked and all revisions are drawn back. Please lets continue on what kind of shape would be acceptable for phenotypic information --jjkoehorst (talk) 06:51, 18 February 2016 (UTC)

I think there is great value in elements of what are proposed and it would make the microbial data on wikidata a much richer resource. Meta data such as Biosafetly level, gram -/+ etc.. would be very useful, but getting Taxonomy identifiers and names from UniProt may not be the best source. I think it would benefit this proposal to have a clear picture of what the scope of the project would be, and a clear definition of each bot task. Putmantime (talk) 23:16, 11 February 2016 (UTC)

Putmantime, mind to help? --Succu (talk) 23:21, 11 February 2016 (UTC)
Succu Yes definitely...can we keep the discussion going on this proposal? I think it has merit, but needs to be clearer. The naming issue for subspecies items seems to have thrown a wrench in things. I think NCBI is a good authority for strain names personally, because the name was submitted by the researcher that submitted sequence data to NCBI, and that is when the NCBI Taxonomy ID was generated as well as genome IDS. Not a scientific name though or consistently formatted. I view it as an appropriate label, and maybe a new 'strain name' property, but see it shouldn't be a taxon name. Any synonyms could be aliases, IMHO Putmantime (talk) 23:34, 11 February 2016 (UTC)
I am in the process of rolling back the changes made by the bot. I think the focus of the conservation has been shifted towards the naming issues which still exists and need to be discussed thoroughly. Currently existing names will not be modified by the bot and its main focus is on the metadata that is available at various resources through the NCBI taxonomic identifier which will not interfere with current information. I know that I initially started about the naming but the main focus is on the metadata. Hopefully we can keep the discussion going on the naming scheme and microbial metadata to come to a good agreement to improve the quality of information in Wikidata. --jjkoehorst (talk) 17:36, 12 February 2016 (UTC)
In the NCBI Taxonomy strains have no rank. We should find a consens that stating taxon rank (P105)=strain (Q855769) is OK. Otherwise we can use instance of (P31)=strain (Q855769) with taxon rank (P105)=novalue. --Succu (talk) 18:51, 12 February 2016 (UTC) E.g. Shigella flexneri 2a str. 301 (Q21102941), Putmantime. --Succu (talk) 22:13, 12 February 2016 (UTC)
There are similar cases elsewhere: "virus" as a subspecific entity is not regulated by a Code of nomenclature. This goes also for "forma specialis", "pathovar", etc. We should have a structure for this. - Brya (talk) 06:17, 13 February 2016 (UTC)
Yes we should. If I remember right f.sp. is used by IF and MycoBank as a rank. Strongly related to this bots task is the question of Candidatus (Q857968). --Succu (talk) 19:18, 13 February 2016 (UTC)
Yes, forma specialis is used by IF and MycoBank as a rank, but that does not make it a rank. And, yes, "Candidatus" is a similar problem case. - Brya (talk) 09:55, 14 February 2016 (UTC)

Hkn-bot[edit]

Hkn-bot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: HakanIST (talkcontribslogs)

Task/s: clean up invalid authority property links in person items , harvest date of births from articles

Code: based on addwiki framework (php) currently being written

Function details: Bot will periodically check P2458 , P2446 , P2447 ,P2448 , P2449 if ids specified are valid by visiting the generated url, if not validated it will remove claims and will report the list. As errors often occur due to invalid data at source wiki or mismatch of footabller id with a manager id. Secondly bot will harvest birth dates items with these properties from imported wikis. Using variations of this wdq generated list , will add date of birth property if there is none.

-- Hakan·IST 18:50, 16 January 2016 (UTC)

Please make some test edits.--Ymblanter (talk) 08:18, 20 January 2016 (UTC)
@Ymblanter: : Ran the bot for the second task (harvesting day of birth from article) for 20 items, throttled to 10second per change. Hkn-bot contribs.-- Hakan·IST 15:54, 20 January 2016 (UTC)
I see that e.g. here you added data but they are unreferenced. Is there any way to add a reference as well?--Ymblanter (talk) 16:36, 20 January 2016 (UTC)
I've been working on adding references for sometime now, but have not got it to work yet.-- Hakan·IST 21:30, 20 January 2016 (UTC)
Your bot added a date of birth of "1 October 1987" for Irakli Chirikashvili (Q3801812), despite the itwiki page stating "10 gennaio 1987" (10 January 1987). I'm afraid a similar issue occurred on a few other items, for instance Q5889913 or Q6771150. --Alphos (talk) 22:37, 20 January 2016 (UTC)
These errors were caused by wrong Transfermarkt player ID (P2446) statements, so the bot is not malfunctioning. It would be however really good if you could add references. This helps, amongst others, to spot the error source more easily. --Pasleim (talk) 20:02, 15 January 2017 (UTC)

Dexbot[edit]

Dexbot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Ladsgroup (talkcontribslogs)

Task/s: Auto-transliterating for names of humans

Code: Based on pwb, probably publish it soon.

Function details: The codes analyses dumps of Wikidata and can create an auto-transliterating system for any given pair of languages based on that. I started with Persian and Hebrew (some edits for test [19] [20]) --Amir (talk) 18:14, 7 April 2015 (UTC)

  • Pictogram voting comment.svg Comment, please let me know when you try your system for some cyrillic language. I'd like to see it myself. --Infovarius (talk) 14:10, 8 April 2015 (UTC)
@Infovarius: I work in pair of languages like fa and he (which the bot adds Persian transliteration based on Hebrew and vice versa) which pair of language do you suggest? en and ru? Amir (talk) 11:54, 9 April 2015 (UTC)
Probably you should have stated this in your request. Your phrase "I started with" has encouraged me :) No, I don't suggest Russian as I understand the complexity of the task. --Infovarius (talk) 13:16, 10 April 2015 (UTC)
@Infovarius: I don't think Russian is too complicated to abandon. I took care of lots of different issues including country of citizenship, etc. so It's not hard for this bot. I asked you what language do think is the best pair for Russian *to start with* Amir (talk) 21:11, 10 April 2015 (UTC)
Will the bot be able to dedect delicate labels as in King An of Han (Q387311)? --Pasleim (talk) 19:24, 13 April 2015 (UTC)
It probably skips them or make a correct transliteration (depends on the language) but I can't say for sure. Let me test Amir (talk) 13:33, 15 April 2015 (UTC)
Are we ready for approval here?--Ymblanter (talk) 16:08, 15 April 2015 (UTC)
  • Just a caveat when when dealing with Chinese languages: Chinese to Latin script (and vice versa) transliterations are rarely standardized. For example, Alan Turing's given name might be transliterated into 艾伦 or 阿兰 (as in the case of Alan Moore (Q205739)) or 亚伦 (as in the case of Alan Arkin (Q108283)). These Chinese characters are roughly resembles "Alan" when pronounced, but due to regional differences (i.e. mainland China, Taiwan, Hong Kong, etc), they result in different transliterations. Even when two people's names are transliterated by the same region, they can be different. There is simply no standardization on this matter. —Wylve (talk) 14:53, 23 April 2015 (UTC)
    hmm, User:Wylve: Just a question: Is it wrong to put "亚伦" for Alan in Alan Turing? Amir (talk) 12:36, 25 April 2015 (UTC)
    It's not wrong, but it might not be the only way people call Alan Turing in Chinese. The lead sentence of Turing's article on zhwiki mentions that "Alan" is also transliterated as 阿兰. —Wylve (talk) 20:48, 25 April 2015 (UTC)
    @Wylve: I made 50 auto-transliterations [21], please check and say if anything is wrong or unusual. Thanks Amir (talk) 20:05, 16 May 2015 (UTC)
    I can't verify every name, since some of those people aren't mentioned in Chinese news sources. My standard of what is "wrong" or "unusual" is whether the transliterations you've produced are used predominantly in reliable and reputable sources. It is hard to judge sometimes, as there is a variety of transliterations used. For instance:
  • Jonathan Ross is transliterated as 强纳·森罗斯 and also 喬納森·羅斯
  • Leonard B. Jordan is also transliterated as 萊昂納德·B·喬丹
  • Jimmy Bennett is also transliterated as 吉米·本内特, 吉米班奈, 吉米班奈特.
  • Jason Lee is also named 杰森·李.
  • "Scott" from A. O. Scoot is also transliterated as 史考特.
All of your edits should be fine if read in Chinese, as they all sound like their English name. Also, I have found this page ([22]), which documents Xinhua News Agency (Q204839)'s official transliterations of names. These transliterations are considered official only in Mainland China. —Wylve (talk) 21:58, 16 May 2015 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Ladsgroup, Wylve: Does this look okay for an approval, or is there something we're missing? I don't speak (or read, for that matter) Chinese  Hazard SJ  05:40, 28 December 2015 (UTC)

  • Amir: Sounds cool. Regarding the he-fa pair
Tagging Amire80 and Eldad who may add some other advices. Eran (talk) 18:53, 4 January 2017 (UTC)
Well, last time people talked in this page was a year and half ago. I need to search to find the script and check. I'll do it soon Amir (talk) 19:12, 5 January 2017 (UTC)
  • @Ladsgroup: Only human names? How about geographical objects (populated places, rivers, etc.)? Right now I'm thinking to transliterate manually some batches of names of Ukrainian localities and to harvest them in WD; should I leave this task for your bot?:) --XXN, 14:49, 12 May 2017 (UTC)
    I don't think the AI would be good enough to do that for now, I'm planning to use w:LSTM in near future and in that case we might do some experiments soon. Amir (talk) 14:56, 12 May 2017 (UTC)

KunMilanoRobot[edit]

KunMilanoRobot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Kvardek du (talkcontribslogs)

Task/s:

  • Add french 'intercommunalités' on french communes items (example)
  • Add french communes population
  • Correct Insee codes of french communes

Code:

Function details: Takes the name of the 'communauté de communes' in the Insee base and adds it if necessary to the item, with point in time and source. Uses pywikipedia. --Kvardek du (talk) 19:27, 21 January 2014 (UTC)

Imo the point in time qualifier isn't valid here as the propriety isn't time specific. -- Bene* talk 15:10, 22 January 2014 (UTC)
point in time (P585) says "time and date something took place, existed or a statement was true", and we only know the data was true at January 1st, due to numerous changes in French organization. Kvardek du (talk) 12:18, 24 January 2014 (UTC)
Interesting, some comments:
  • Not sure that "intercommunalités" are really aministrative divisions (they are built from the bottom rather than from the top). part of (P361) might be more appropriate than located in the administrative territorial entity (P131)
  • Populations are clearly needed but I think we should try do it well from the start and that is not easy. That seems to require a separate discussion.
  • INSEE code correction seems to be fine.
  • Ideally, the date qualifiers to be used for intercommunalité membership would be start time (P580) and end time (P582) but I can't find any usable file providing this for the whole country. --Zolo (talk) 06:37, 2 February 2014 (UTC)
Kvardek du : can you add « canton » and « pays » too ? (canton is a bit complicated since some cantons contains only fraction of communes)
Cdlt, VIGNERON (talk) 14:01, 4 February 2014 (UTC)
Wikipedia is not very precise about administrative divisions (w:fr:Administration territoriale). Where are the limits between part of (P361), located on terrain feature (P706) and located in the administrative territorial entity (P131) ?
Where is the appropriate place for a discussion about population ?
VIGNERON : I corrected Insee codes, except for the islands : the same problem exists on around 50 articles due to confusion between articles and communes on some Wikipedias (I think).
Kvardek du (talk) 22:26, 7 February 2014 (UTC)
@Bene*, Vogone, Legoktm, Ymblanter, The Anonymouse: Any 'crat to comment?--GZWDer (talk) 14:37, 25 February 2014 (UTC)
I'm still not familiar with the "point in time" qualifier. What about "start date" because you mentioned the system has changed to the beginning of this year? Otherwise it might be understood as "this is only true/happened on" some date. -- Bene* talk 21:04, 25 February 2014 (UTC)
Property retrieved (P813) is for the date the information was accessed and is used as part of a source reference. point in time (P585) is for something that happened at one instance. It is not appropriate for these entities which endure over a period of time. Use start time (P580) and end time (P582) if you know the start and end dates. Filceolaire (talk) 21:19, 25 March 2014 (UTC)

Symbol support vote.svg Support if the bot user uses start time (P580) and end time (P582) instead of point in time (P585) --Pasleim (talk) 16:48, 28 September 2014 (UTC)

@Kvardek du: Do you still plan to run the bot? If so, could you please do agian some test edits with the use of start time (P580), end time (P582) instead of point in time (P585)? --Pasleim (talk) 07:52, 24 May 2015 (UTC)
@Pasleim: : it's planned, but not for the moment... The problem I have with french data is that you only have the membreship at a moment t, and not with a start time (P580). Kvardek du (talk) 13:20, 25 May 2015 (UTC)
Kvardek du then use retrieved (P813) in the reference and leave out start time (P580) and point in time (P585). Joe Filceolaire (talk) 08:33, 23 July 2015 (UTC)
Filceolaire : yeah but I have a retrieved (P813) t2 which is different from my point in time (P585)... Kvardek du (talk) 15:47, 24 July 2015 (UTC)
If you don't know the 'start time' then leave it out. If you want then you can create a separate item for the document that the data comes from and add the point in time statement to that item then reference the item for that document in the references for the 'located in ... entity' statements. Look on it as the 'point in time' date relates to the info in the document (true on that date).
Note that population figures should have a 'point in time' qualifier to say when that population figure applies since the population figure is not true for a period; it is only true for the day it was measured. Joe Filceolaire (talk) 00:55, 25 July 2015 (UTC)

AviBot[edit]

AviBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Offthewoll (talkcontribslogs)

Task/s: Retrieve information about universities on Wikidata.

Code: Can provide upon request.

Function details: Retrieve information about universities on Wikidata. This bot is for reads only, no editing. Using a small Python script I've written to get a list of entities using the wdq API and then get information about each one using wbgetentities with the Wikidata API. --Offthewoll (talk) 21:29, 17 May 2016 (UTC)

@Offthewoll: Why do you think you need botflag for? Are you going to reach any limitations regarding API? Matěj Suchánek (talk) 06:44, 9 June 2017 (UTC)

InteliBOT[edit]

InteliBOT (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Miguel2706 (talkcontribslogs)

Task/s: Move categories in eswiki

Code: Little modification of pywikibot.

Function details: Example 1 Example 2 --Miguel2706 (talk) 20:13, 16 January 2015 (UTC)

@Miguel2706: Could you please indicate whether bot permissions are still needed? Thank you. --Vogone (talk) 19:49, 12 February 2017 (UTC)

mahirbot[edit]

mahirbot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Mdmahir (talkcontribslogs)

Task/s:

  1. To Update Descriptions for Tamil Films for both English and Tamil for almost 5000+ items.
  2. To Update Wikimedia category for Tamil Films (5000+ items)


Code: http://tools.wmflabs.org/wikidata-todo/quick_statements.php

Data source: wikidataquery


Function details:

  • Description(English): Tamil Film (2014)
  • Description(Tamil): தமிழ்த் திரைப்படம் (2014)

Note: 2014 is production year of the film

Because its 5000+ items, I prefer to use bot account with community consensus. Thanks --Mdmahir (talk) 04:22, 25 February 2016 (UTC)

Perhaps, you can apply for a flood flag (for the created bot account or your account) on WD:BN, as this is a one-time task using a mass editing tool. XXN, 17:53, 10 May 2017 (UTC)
Pinging @Emijrp: as he has an appoved bot for a similar task, filling better descriptions for film items. XXN, 18:16, 10 May 2017 (UTC)

My code is available if Mdmahir (talkcontribslogs) wants to use it. I can't add new languages myself by now. I am off for a few weeks. Emijrp (talk) 19:03, 13 May 2017 (UTC)

SaamDataImportBot[edit]

SaamDataImportBot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Rkbrasse (talkcontribslogs)

Task/s:Import Smithsonian American Art Museum related data, including video, publications, exhibitions ...

Code:in the works

Function details: First set of data imported will be about videos related to exhibitions and artists that we have published on youtube. We will branch out to general museum data, exhibitions data, objects data and publications data if not already present. I will be more specific once the function is more nailed down and tested.

Video data mapping[edit]

  • We need a new Entity Type called Online Video that will contain the following properties
  • Title maps to P1476
  • url maps to streaming media
  • thumbnail needs to map to an image property

--Rkbrasse (talk) 18:45, 20 April 2016 (UTC)

welvon-bot[edit]

welvon-bot (talkcontribsnew itemsSULBlock logUser rights logUser rights)
Operator: Welvon-bot (talkcontribslogs)

Add properties to wikidata item by mining the text of the wikipeida articles that belongs to the item. Task/s:

Not implemented yet! Code:

1-Scanning the first or/and second paragraph in wikipedia article which usually defines the article. 2-The text scanned from the article is the input to the model which will analyse the text. 3-The model output should a the properties of the wikidata's item. 4- Using API the properties of the wikidata's item is updated. 5- restart from the step 1 Function details: --Welvon-bot (talk) 08:56, 1 May 2016 (UTC)