Shortcuts: WD:RFBOT, WD:BRFA, WD:RFP/BOT

Wikidata:Requests for permissions/Bot

From Wikidata
Jump to: navigation, search
Wikidata:Requests for permissions/Bot
To request a bot flag, or approval for a new task, in accordance with the bot approval process, please input your bot's name into the box below, followed by the task number if your bot is already approved for other tasks.


Old requests go to the archive.

Once consensus is obtained in favor of granting the botflag, please post requests at the bureaucrats' noticeboard.

Bot Name Request created Last editor Last edited
FLOSSbot 2016-07-25, 17:37:46 Dachary 2016-07-25, 22:45:42
Lisp.hippie.bot 2016-07-12, 16:45:22 Ymblanter 2016-07-25, 19:48:07
LandesfilmsammlungBot 2016-05-30, 20:48:23 Ymblanter 2016-07-18, 10:58:16
Abbe98 Bot 2016-07-09, 21:16:11 Ymblanter 2016-07-12, 08:03:59
KaldariBot 2016-07-08, 00:15:23 Jura1 2016-07-20, 04:58:43
WikiLovesESBot 2016-07-03, 10:25:13 Discasto 2016-07-22, 22:02:47
MatSuBot 6 2016-07-01, 19:12:23 Ymblanter 2016-07-05, 14:47:41
mro-bot 2016-03-17, 10:27:44 Edgars2007 2016-07-18, 07:09:11
1-Byte-Bot 2016-03-02, 15:23:09 1-Byte 2016-03-03, 08:58:36
Hkn-bot 2016-01-16, 18:52:00 Alphos 2016-01-20, 22:38:01
RollBot 2016-01-14, 17:01:42 Alphos 2016-04-06, 13:40:49
Dexbot 11 2015-04-07, 18:15:00 Hazard-SJ 2015-12-28, 05:40:11
KunMilanoRobot 2014-01-21, 19:27:44 Alphama 2016-06-21, 18:15:06

FLOSSbot[edit]

FLOSSbot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Dachary (talkcontribslogs)

Task/s: Add software quality assurance P2992 statements with qualifiers for github/travis coupling

Code: http://paste2.org/39ekOOeX

Function details:

Look for software hosted on github.com that have a matching travis-ci.org continuous integration page and no software quality assurance P2992 statement. Create a statement with the described at URL qualifier P972 set to the .travis.yml file hosted in the repository and the archive URL qualifier P1065 set to results archived at travis-ci.org --Dachary (talk) 17:37, 25 July 2016 (UTC)

I'm not sure how useful this is - does the presence of travis files ensure that this process is actually being followed? However, the bot seems to be functioning correctly - i.e. it's checking that the files exist and linking nicely to them, so I suppose there's no harm in approving this. ArthurPSmith (talk) 18:32, 25 July 2016 (UTC)
The travis file in itself does not, you are correct. But the matching URL on travis-ci.org runs it and accurately displays the state of the test results on every commit. Dachary (talk) 22:44, 25 July 2016 (UTC)

KaldariBot[edit]

KaldariBot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Kaldari (talkcontribslogs)

Task/s: Replace instances of country (P17):Antarctic Treaty area (Q21590062) with country (P17):novalue, per these three discussions. Overall, 7 people have expressed an opinion on the matter: 5 in favor of the change, 1 opposed, and 1 neutral. I think that's enough of a consensus to make the change.

Code: https://github.com/kaldari/wikidata-fixes/blob/master/fix.php

Function details: This bot takes all items that currently have the claim country (P17):Antarctic Treaty area (Q21590062) (i.e. places in Antarctica), removes their existing country claims, and adds the claim country (P17):novalue. It waits 2 second after each edit (to prevent database lag). --Kaldari (talk) 00:15, 8 July 2016 (UTC)

I'm quite sure you can change the target of the claim in one edit. Multichill (talk) 09:08, 9 July 2016 (UTC)
That is indeed possible with the used API: setMainSnak( new PropertyNoValueSnak( PropertyId::newFromNumber( 17 )). Mbch331 (talk) 17:28, 10 July 2016 (UTC)
@Multichill, Mbch331: I was thinking it would be safer to remove all the country claims first as many locations in Antarctica are claimed by multiple countries. Otherwise we might end up with multiple 'country:novalue' statements (if we're just changing the existing claims). I'm open to doing it either way though. Kaldari (talk) 19:02, 12 July 2016 (UTC)
@Multichill, Mbch331: Does my explanation above make sense, or would you prefer that I just change the existing claims? Kaldari (talk) 21:36, 19 July 2016 (UTC)
We are still discussing the best approach on Property talk. Beyond to what people like or dislike or are used to see, we found a definition to which some of us agree and a reference for the current approach. As we are exploring alternatives and seeking references for these, it seems premature to do any change for now.
--- Jura 04:58, 20 July 2016 (UTC)

WikiLovesESBot[edit]

WikiLovesESBot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Discasto (talkcontribslogs)

Task/s: Miscellaneous tasks associated to photo upload campaigns promoted by WM-ES:

  • Assignment of commons categories to items handled in the campaigns (for example Wiki Loves Earth, Wiki Loves Folk, Wiki Loves Monuments, Photographs from Spanish Municipalities without pictures, and the like.
  • Sourcing of statements for items handled in the campaigns...

Code: Global repository is in here. Bot code is here.

Function details: The bot takes as input a series of lists (so called annexes in the Spanish Wikipedia, see example here and extracts necessary information: mainly wikidata item and commons category. If found, the bot does as follows:

  • Look up the Wikidata item.
  • Determines whether "Municipality of Spain" statement is available in P31 claim. If not, it creates the statement. If available, the statement is sourced to Spanish Wikipedia.
  • If the source (the list in the Spanish Wikipedia) provides a category, the bot determines whether a claim for Commons-category is available. If not, it creates the claim. If available, the claim is sourced to Spanish Wikipedia.
  • Finally, a commons sitelink for the category provided in the source is inserted if not available. If a gallery was already provided as commons sitelink, it's not modified.
  • Inconsistencies are logged during the process.

--Discasto (talk) 10:25, 3 July 2016 (UTC)

Symbol support vote.svg Support I strongly support this request. --Rodelar (talk) 22:04, 3 July 2016 (UTC)
Symbol support vote.svg Support I also support. --Harpagornis (talk) 15:00, 4 July 2016 (UTC)
Symbol support vote.svg Support I support this request. Ivanhercaz (talk) 16:17, 4 July 2016 (UTC)
Symbol support vote.svg Support I support this request. --Bauglir (talk) 16:28, 4 July 2016 (UTC)
Symbol support vote.svg Support I support this request. --ElBute (talk) 16:47, 4 July 2016 (UTC).
Symbol support vote.svg Support I support this request.--Pedro J Pacheco (talk) 20:14, 4 July 2016 (UTC)
Symbol support vote.svg Support The bot operator is reliable and knows what he does Poco2 21:07, 4 July 2016 (UTC)
Symbol support vote.svg Support I support this request. The operator has done a good work with other bots in different projects. --Millars (talk) 15:47, 5 July 2016 (UTC)
Symbol support vote.svg Support I support this request. --Dorieo (talk) 17:41, 6 July 2016 (UTC)
  • Sounds like a good idea. It seems we already have more municipalities than there should be ([1] compared to 8122 on w:Municipalities of Spain). Obviously, it could include some former municipalities. We should only have 1 item per municipality. It could be that interwikis need to be fixed. What is the plan if P31 holds concejo of Asturias (Q5055981) or parroquia (Q3333265)?
    --- Jura 09:09, 7 July 2016 (UTC)
Sorry, Jura, I missed your comment. I have to say that I don't fully understand your comment (mainly that part related to the amount of municipalities mismatch). With regard the second part, I will patch the code to consider also subclasses. However, parroquia (Q3333265) does not apply, as a parroquia is a subdivision of a municipality. The lists we're handling have been reviewed several times by the WM-ES members and all the items are actually municipalities. Smaller subdivisions can be considered in next editions, but not now. Therefore, my only concern relates to the subclasses (I didn't actually consider that possibility). Best regards --Discasto (talk) 22:02, 22 July 2016 (UTC)
  • It'll be great if some active editors of Wikidata could give their opinions. Canvassing of users with a low amount of contributions doesn't help. Sjoerd de Bruin (talk) 14:55, 7 July 2016 (UTC)
I took a look at contribs - it looks like a lot of entries have already been made, but the bot was blocked as unapproved. From my review of the entries made the bot seems to be operating reasonably. However, adding a reference of "imported from xx wikipedia" is barely better than no source at all, I'm not sure this is really helpful. If there's an actual es.wikipedia.org page that is the source of the information, providing that via "reference URL" and "retrieved on" properties would be more useful. An external source for this data would be much better. ArthurPSmith (talk) 14:42, 8 July 2016 (UTC)
I have no strong opinion on this. I do agree on providing an external source if available. It's not the case in most of the situations we're handling. Therefore, I'll simply skip this step. In fact, the core functionality (which I'm currently doing by hand) was related to setting commons categories. As we're handling all the items in the list, it seemed sensible to add sources. If you feel it's useless (unless a proper source is provided), I'll skip this step. Thanks for providing feedback --Discasto (talk) 22:45, 12 July 2016 (UTC) PS: yes, it's been blocked in the middle of a task that nowadays I have to do by hand. I don't really understand this block. Seems to me the typical bureaucratic behaviour that harms more than helps
I am going to approve the bot tomorrow provided there have been no objections.--Ymblanter (talk) 09:46, 13 July 2016 (UTC)
It would be good to have an answer to my question. We don't want to end up with even more duplicates.
--- Jura 12:34, 15 July 2016 (UTC)

MatSuBot 6[edit]

MatSuBot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Matěj Suchánek (talkcontribslogs)

Task: Convert HTML entities in terms and maybe statements to regular text.

Code: Not yet decided on the implementation.

Function details: The biggest problem is at the moment querying for items which have such errors (if I don't find any other possibilty, I will try to combine SQL and PWB). --Matěj Suchánek (talk) 19:12, 1 July 2016 (UTC)

Please make some test edits.--Ymblanter (talk) 14:47, 5 July 2016 (UTC)

mro-bot[edit]

mro-bot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Mro (talkcontribslogs)

Task/s: Update ratings of chess players

Code: Based on pywikibot

Function details: Every time the Chess Federation (FIDE) publishes a new rating list (every month, currently), update the corresponding properties of active chessplayers (property P1087, Elo rating). This used to be done by DrTrigonBot (talkcontribslogs) before (see here) but the updates have stopped more than a year ago, I don't know why. Later, possibly extend the updates to related properties, such as world ranking, women ranking and national ranking. This information is generally present in the chess player infobox, but it is not using wikidata since the data is outdated (I guess). I am an active contributor of the French wikipedia (fr:Utilisateur:Mro) with 50k contribs and I have used pywiki occasionnaly since 2008 (always supervised edits). --Mro (talk) 10:26, 17 March 2016 (UTC)

This sounds good to me - it would be good to see some sample edits from the bot for this first though - run it on a dozen or so players just to confirm it's operating properly. Also I'm wondering if we should look into the issue of bots operating for a while and then ceasing - if the code is available and working somebody else could take it on (with a new bot account)? ArthurPSmith (talk) 14:30, 18 March 2016 (UTC)
Please make some test edits.--Ymblanter (talk) 18:37, 27 March 2016 (UTC)
Pinging Wesalius, as he also wanted to do this (but probably won't till Pasleim would find the source code). @Mro: but we should store also all historical ratings, don't we? --Edgars2007 (talk) 13:21, 26 April 2016 (UTC)
Yes, I was going to run the same bot. My plan was to first get up and running bot that would update elo ratings of go players and then move onto chess players. Right now I am at the end of the go phase, I have the script ready (result of the first round of elo updating), but I am waiting for the result of the kind help of Neo-Jay, who is matching remaining go player items with English transcription of their names, since most of them already have an item, just not with English label. --Wesalius (talk) 14:16, 26 April 2016 (UTC)

@Mro: any update? --Edgars2007 (talk) 07:09, 18 July 2016 (UTC)

1-Byte-Bot[edit]

1-Byte-Bot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: 1-Byte (talkcontribslogs)

Task/s: Import census data from the Turkish Statistical Institute.

Code: Based on pywikibot

Function details:

--1-Byte (talk) 15:22, 2 March 2016 (UTC)

Update: Currently on hold as it's not entirely clear how to cite the data. --1-Byte (talk) 08:58, 3 March 2016 (UTC)

Phenobot[edit]

Phenobot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Jjkoehorst (talkcontribslogs)

Task/s: The first step will be to improve the lineage annotation of organisms including taxon identifiers, correct species names and corresponding references using the UniProt Taxonomy database. The next step will be to include missing organisms into Wikidata and phenotypic information such as biosafety level, oxygen requirements and other features. Continuous discussion can be found here User:Phenobot/Discussion

Code:https://bitbucket.org/jjkoehorst/wikidatabots

Function details:This bot is based upon the basis of the ProteinBoxBot framework. It will use the UniProt Taxonomy SPARQL end point for data extraction and initially will work on completing existing entries as much as possible with correct names and taxon identifiers and missing species will be added to WD. For strains with existing phenotypic information this can be complemented from various sources which are currently under investigation such as GOLD or DSMZ. --jjkoehorst (talk) 15:13, 4 February 2016 (UTC) Josve05a (talk)
FelixReimann (talk)
Infovarius (talk)
Daniel Mietchen (talk)
Soulkeeper (talk)
Brya (talk)
Klortho (talk)
Delusion23 (talk)
Andy Mabbett (talk)
Dan Koehl (talk)
Achim Raschka (talk)
TomT0m
Tinm
MPF
Abbe98
Rod Page
Joel Sachs
Prot D
Michael Goodyear
PhiLiP
pvmoutside
Faendalimas
Lymantria (talk)

Pictogram voting comment.svg Notified participants of Wikiproject Taxonomy

@Succu: Can you have a look at this request? --Pasleim (talk) 10:32, 5 February 2016 (UTC)
I have some problems with the task "correct species names" NCBI is not a nomencatural database. It contains spelling errors like other databases too. And I have problems with this kind of sourcing. The NCBI-ID is allready referenced, nothing is imported from UniProt. The Disclaimer tells us „The NCBI taxonomy database is not an authoritative source for nomenclature or classification - please consult the relevant scientific literature for the most reliable information.“ --Succu (talk) 11:40, 5 February 2016 (UTC)
Here the Bot removed taxon name (P225). --Succu (talk) 12:10, 5 February 2016 (UTC) PS: Pseudomonas putida 10-23 (Q22661287) P225 is missing. --Succu (talk) 07:24, 6 February 2016 (UTC)
I agree with Succu. Why go change species names, based on UniProt? Could do serious damage. And indeed that kind of sourcing is unwanted and adds nothing: database is slow enough as it is. - Brya (talk) 11:58, 5 February 2016 (UTC)
This proposal does not seem to be mature. The Uniprot taxonomy database is a customized version of the NCBI taxonomy database, which itself is not reliable for taxonomy anyway. It is currently not clear if the bot owner knows enough about taxonomy and nomenclature to understand the issues associated with Wikidata taxon items. Also the proposed use of imported from (P143) does not seem appropriate.
Nevertheless my understanding is that many of this bot's contributions would be made in microbiology, and the issues would be a little different if its contributions were limited to this area. Otherwise I see no reasons to prevent the bot from adding “biosafety levels, oxygen requirements and other [such] features”.
Tinm (d) 18:29, 5 February 2016 (UTC)
Yes the main basis of this bot will be within microbiology and I can restrict the bot to remain within prokaryotes. About the naming, what I am currently doing is to leave the name alone if it exists in UniProt taxonomy as either other name or scientific name. But I can leave the name as it is as I am mostly relying on the taxonomic identifier from the NCBI/UniProt. My main priority is to have the NCBI Taxonomy identifier correct / filled in so that I can include he phenotypic characteristics and also easily can verify wether an organism page has been created and if not create as such. I can also skip adding references if one is already available. --jjkoehorst (talk) 06:45, 6 February 2016 (UTC)
Yes, this taxon name is pretty bad. And again, the fact that the rank is that of species does not need a reference (this is so by definition), and as there is a link to NCBI, the fact that the taxon name is accepted by NCBI does not need to be repeated in the form of a reference to taxon name. - Brya (talk) 07:44, 6 February 2016 (UTC) -also beyond understanding - Brya (talk) 07:51, 6 February 2016 (UTC) - And "instance of taxon" means that "taxon name" is present in the item. UniProt cannot know anything about that, so adding a reference to "instance of taxon" is pure misrepresentation. - Brya (talk) 07:58, 6 February 2016 (UTC)
Sorry about those naming, ill restrain the bot then to only prokaryotes if you prefer and to only update missing naming and NCBI Taxonomy information. When that works out good i'll make some property requests for the phenotypic information as stated earlier, ok? --jjkoehorst (talk) 09:31, 6 February 2016 (UTC)
If that means 1) only missing names of prokaryotes and 2) sourcing only for NCBI Taxonomy information, then yes, OK. - Brya (talk) 13:00, 6 February 2016 (UTC)
Looks like the databases are out of sync. NCBI Taxonomy ID (P685)=208964 gives Pseudomonas aeruginosa PAO1 (www.ncbi.nlm.nih.gov/taxonomy) and Pseudomonas aeruginosa (strain ATCC 15692 / PAO1 / 1C / PRS 101 / LMG 12228) (www.uniprot.org/taxonomy). This explains „adjustments“ like this one. --Succu (talk) 11:05, 6 February 2016 (UTC)
Looks like UniProt provides five separate names, rolled up into one entry? - Brya (talk) 13:00, 6 February 2016 (UTC)
There is a mapping between NCBI Taxonomy ID (P685) and a so called „Official (scientific) name“ used by UniProt. So maybe we need a qualifier for P685 to indicate this name. --Succu (talk) 16:44, 6 February 2016 (UTC)
Yes I had an email conversation with uniprot and this was a reply about that case: The idea is not to use a concise name. A same strain may be known by different names because it has been deposited in different organizations (institutions, private companies, etc) with different names. So we try to track these co-identical strain names used by the major concerned organizations for a specific strain. This name is stored as scientifcName and all the variances are stored among other names. --jjkoehorst (talk) 19:30, 6 February 2016 (UTC)
So what's your conclusion? BTW: I stumbled over User:Phenobot/Discussion, which looks like an outline of the intended bot task, but not mentioned here. --Succu (talk) 20:28, 6 February 2016 (UTC)
Well one way it makes sense to use a general nomenclature which encapsulates all possible extra namings but it is not the true scientific name. Maybe a taxon synonym name entry could be used which lists other names belonging to this organism.Yes the discussion page is to discuss the roadmap after the general taxon identification and naming is completed sorry that I did not mention it here but it was not completed yet to my opinion but feel free to comment on it if you like... --jjkoehorst (talk) 08:02, 7 February 2016 (UTC)
Strictly speaking these are not scientific names at all. The ICNP does not cover names at a rank lower than subspecies. AFAIK there is no formal system for naming strains, so this may well happen on an ad hoc basis, or according to a local standard. In fact, it would help somewhat not to put these in "taxon name". - Brya (talk) 08:29, 7 February 2016 (UTC)
Then I would suggest that the names currently in WD should correspond to the NCBI nomenclature or to any of the Uniprot (scientificnames/othernames) if this is not the case then it should be either the scientific name from the NCBI or from UniProt if there is no reference available. What do you think? And where would you place the other names? As a common name or something else? --jjkoehorst (talk) 08:40, 7 February 2016 (UTC)
? The names in NCBI/Uniprot are not scientific names (not regulated by a Code of nomenclature). The most obvious way to handle strains would be to have a property "strain name" (perhaps to be combined with "parent taxon", etc). - Brya (talk) 09:33, 7 February 2016 (UTC)
My consideration are the same. --Succu (talk) 10:10, 7 February 2016 (UTC)
I agree a strain property should then be created which specifies the name of a strain? However taxon name then becomes obsolete for strains at least if I am correct. The elements that are obligatory for strains are then parent taxon, taxon rank, NCBI Taxonomy ID, general labels and instance of. Anything that else that can be used with the current properties? --jjkoehorst (talk) 11:49, 7 February 2016 (UTC)
Yes, this new property should be used instead of P225. This would reduce "Format" violations of P225 too. --Succu (talk) 12:54, 7 February 2016 (UTC)
Sounds good, who is going to propose for a new property for taxon name and can this taxon name then also contain multiple values, such as synonyms of the strain name or should another property be made for that? --jjkoehorst (talk) 14:47, 7 February 2016 (UTC)
I think we need a second property UniProt name to modell the relationship to the NCBI id. In case of strains we could use aliasses to add the name variants. You can propose them at Wikidata:Property proposal/Natural science. --Succu (talk) 18:49, 7 February 2016 (UTC)
A property "UniProt" to link to the UniProt-entries may be handy. Not sure what else you mean, as UniProt-entries may concern regular taxa as well as strains and whatever else UniProt includes. - Brya (talk) 06:40, 8 February 2016 (UTC)
I am not much in favour of multiple names in one item, and including out-of-use names beside the current name seems like a recipe for disaster. But we really do need a separate property "taxon synonym (string)" beside the present "taxon synonym [item]". - Brya (talk) 15:53, 7 February 2016 (UTC)
Yes we should request for a taxon synonym string variant. Then by default it would be the scientific name of the NCBI nomenclature if no better name is available? --jjkoehorst (talk) 19:50, 7 February 2016 (UTC)
Synonyms are an area full of hidden dangers. What we may really need are:
  • "taxon synonym, homotypic (item)"
  • "taxon synonym, heterotypic (item)"
  • "taxon synonym, homotypic (string)"
  • "taxon synonym, heterotypic (string)"
Especially heterotypic synonyms may vary strongly, depending on point of view (references!). Brya (talk) 06:40, 8 February 2016 (UTC)
I looked into: Property:P1843 which is a common name for a given taxon. As basis we could use the NCBI nomenclature for strains (and/or others?). And over time add the homotypic/heterotypic naming. Shall I run a test with the restricted settings I have now? Only bacteria, no name updating if there is a name available and no reference adding if the value is already present? --jjkoehorst (talk) 08:01, 8 February 2016 (UTC)
@Brya: Regarding how to handle synonyms, I have thought of a way of doing things that would solve a very big part of the issues we encounter with the current one. I'm going to make a post about that on the project talk page when I'll have a bit of time. It would imply significant changes but I really believe it would answer many issues efficiently. Anyway, I guess you will see when I put it up. —Tinm (d) 02:34, 9 February 2016 (UTC)
I will be most interested to see what you come up with. - Brya (talk) 06:13, 9 February 2016 (UTC)

Greetings all. I am part of the GeneWiki team and I am adding genes and proteins for bacteria under our MicrobeBot (talkcontribslogs) account. see: MicrobeBot Task Page For my project it is important that there remain distinct strain items with NCBI taxonomy identifiers so I can link genes and proteins to them via found in taxon (P703). Just a thought, but we could distill some of the views here in a mockup of a Wikidata strain item in this table below? Using Pseudomonas aeruginosa PAO1 (Q21065234) as an example. I added some of the basics that are there for strain items now. I personally think a new 'NCBI strain name' type of property would be a good thing to have as these strain names are directly linked to the NCBI Taxonomy ID. Putmantime (talk) 18:46, 9 February 2016 (UTC)

Property Description Datatype Expected value

(if not listed, see property definition)

P225 taxon name String Species name? From NCBI, UniProt?
P??? strain name String Strain name From NCBI, UniProt, etc...
P171 parent taxon Item Bacterial species item e.g. Pseudomonas aeruginosa (Q31856)
P105 taxon rank Item Strain e.g. strain (Q855769)
P685 NCBI Taxonomy ID String 208964

What we are talking about is this:

Property Description Datatype Expected value

(if not listed, see property definition)

P??? strain name String Strain name From NCBI, UniProt, etc... e.g. Pseudomonas aeruginosa PAO1 (Q21065234)
P171 parent taxon Item Bacterial species item e.g. Pseudomonas aeruginosa (Q31856)
P105 taxon rank Item Strain e.g. strain (Q855769)
P685 NCBI Taxonomy ID String 208964
P??? UniProt ID String from UniProt, different from UniProt ID (P352)

- Brya (talk) 04:42, 10 February 2016 (UTC)

I agree. P225, P1420 and P1843 should not be taken form NCBI, UniProt? No items should be created on this basis. --Succu (talk) 06:51, 10 February 2016 (UTC) PS: I added UniProt ID (P352) and miss now something like UniProt name. --Succu (talk) 08:02, 10 February 2016 (UTC)
Not sure what you mean by "UniProt name". Is this something like "Pseudomonas aeruginosa (strain ATCC 15692 / PAO1 / 1C / PRS 101 / LMG 12228)", which to me does not look like a name but five names, for what may be (deemed to be) one strain. - Brya (talk) 11:39, 10 February 2016 (UTC)
Yes, the so called „Official (scientific) name“ used by UniProt mapped to NCBI Taxonomy ID (P685). --Succu (talk) 12:01, 10 February 2016 (UTC)
It is long list, and many names are regular scientific names. Could you point out a few examples? - Brya (talk) 12:07, 10 February 2016 (UTC)
  • 634452 ← Acetobacter pasteurianus (strain NBRC 3283 / LMG 1513 / CCTM 1153)
  • 4024 ← Acer saccharum
  • 441768 ← Acholeplasma laidlawii (strain PG-8A)
  • 237531 ← Actinomycete sp. (strain K97-0003)
  • 928294 ← Human adenovirus C serotype 1 (strain Adenoid 71)
  • 262698 ← Brucella abortus biovar 1 (strain 9-941)
  • 48984 ← Pantoea agglomerans pv. gypsophilae
  • 45222 ← Parana mammarenavirus (isolate Rat/Paraguay/12056/1965)
--Succu (talk) 12:23, 10 February 2016 (UTC)

But not all these names are unique to UniProt. For example, Acer saccharum is a regular botanical name, and Pantoea agglomerans pv. gypsophilae appears to be in fairly widespead use, as is Brucella abortus biovar 1 (strain 9-941). - Brya (talk) 17:32, 10 February 2016 (UTC)

My thought was that jjkoehorst want's to integrate these names somehow. If the speclist is important for the planned bots job I can provide some statistics. --Succu (talk) 18:36, 10 February 2016 (UTC)
Eventually I would like to create a most comprehensible but still useful taxonomy resource where people can easily search for organisms and their phenotypic characteristics. Also that when a new strain is sequenced its information can easily be integrated into WD according to a defined data model. However for this a solid ground needs to be established first and that is what I was thinking of. In general the primary identifier is the NCBI Taxonomic number. Which can be completed with information from NCBI scientific names and UniProt scientific / other names. If for obvious reasons this would introduce too many errors or is not according to the idea of how we should define a strain than this is perfectly fine to me. What was driving me from the beginning is that I want to connect phenotypic information from multiple resources to taxonomic identifiers and corresponding genetic makeup. I of course can do this on my own machine on my own little project and this would work out fine but no one else could benefit from this and thats why I started working on the idea of this phenobot (hence the name...).. In the discussion of the bot as mentioned by Succu I am expanding this idea further with possible phenotypic characteristics that I can get my hands on and could theoretically be integrated into WD but I am still writing on this User:Phenobot/Discussion. --jjkoehorst (talk) 21:04, 10 February 2016 (UTC)
As an example these are statements that would be interesting to add. Not all have properties and I am preparing for that.
Property Description Datatype Expected value
P1604 biosafety level Item Level 1 Q18396533

Level 2 Q18396535 Level 3 Q18396538 Level 4 ... see Q21079489

Property: P2043 length / size string 902320 bp Q21481789
P??? GC content float
P??? Gram staining item Gram positive Q857288

Gram negative Q632006

P??? Pathogenic to item Human, Plant, Animal, etc...
P??? Motility item Chemotactic (Chemotaxis) Q658145

Motile Q3359 Nonmotile (not yet found)

P??? Environment item or string soil, seawater, marine sediment, forest soil, etc...
P??? Temperature range item Hyperthermophile Q1784119

Mesophile Q669652 Psychrophile Q913343 Thermophile Q834023

Property: P2076 Temperature (optimal temperature) Q21079489

--jjkoehorst (talk) 09:11, 11 February 2016 (UTC)

If all that is to be included in an item, it becomes understandable that Succu would like a UniProt name, and (presumably?) a separate item for each such UniProt entity. - Brya (talk) 17:26, 11 February 2016 (UTC)
If I understand you correctly you mean to store the Biosafety/Gram/Temp/etc.. in a UniProt item? These are generic features from different sources (DSMZ/GOLD/etc) and are linked via the NCBI Taxonomy ID and in that case would not make sense to store these items under a uniprot name entry. --jjkoehorst (talk) 19:46, 11 February 2016 (UTC)

Back to the roots[edit]

Symbol oppose vote.svg Oppose: Back to the roots. „Code“ is protected. I see no reactions on error reports. The task is obscure. jjkoehorst, please rollback your bots contributions. --Succu (talk) 22:32, 11 February 2016 (UTC)

Code is unlocked and all revisions are drawn back. Please lets continue on what kind of shape would be acceptable for phenotypic information --jjkoehorst (talk) 06:51, 18 February 2016 (UTC)

I think there is great value in elements of what are proposed and it would make the microbial data on wikidata a much richer resource. Meta data such as Biosafetly level, gram -/+ etc.. would be very useful, but getting Taxonomy identifiers and names from UniProt may not be the best source. I think it would benefit this proposal to have a clear picture of what the scope of the project would be, and a clear definition of each bot task. Putmantime (talk) 23:16, 11 February 2016 (UTC)

Putmantime, mind to help? --Succu (talk) 23:21, 11 February 2016 (UTC)
Succu Yes definitely...can we keep the discussion going on this proposal? I think it has merit, but needs to be clearer. The naming issue for subspecies items seems to have thrown a wrench in things. I think NCBI is a good authority for strain names personally, because the name was submitted by the researcher that submitted sequence data to NCBI, and that is when the NCBI Taxonomy ID was generated as well as genome IDS. Not a scientific name though or consistently formatted. I view it as an appropriate label, and maybe a new 'strain name' property, but see it shouldn't be a taxon name. Any synonyms could be aliases, IMHO Putmantime (talk) 23:34, 11 February 2016 (UTC)
I am in the process of rolling back the changes made by the bot. I think the focus of the conservation has been shifted towards the naming issues which still exists and need to be discussed thoroughly. Currently existing names will not be modified by the bot and its main focus is on the metadata that is available at various resources through the NCBI taxonomic identifier which will not interfere with current information. I know that I initially started about the naming but the main focus is on the metadata. Hopefully we can keep the discussion going on the naming scheme and microbial metadata to come to a good agreement to improve the quality of information in Wikidata. --jjkoehorst (talk) 17:36, 12 February 2016 (UTC)
In the NCBI Taxonomy strains have no rank. We should find a consens that stating taxon rank (P105)=strain (Q855769) is OK. Otherwise we can use instance of (P31)=strain (Q855769) with taxon rank (P105)=novalue. --Succu (talk) 18:51, 12 February 2016 (UTC) E.g. Shigella flexneri 2a str. 301 (Q21102941), Putmantime. --Succu (talk) 22:13, 12 February 2016 (UTC)
There are similar cases elsewhere: "virus" as a subspecific entity is not regulated by a Code of nomenclature. This goes also for "forma specialis", "pathovar", etc. We should have a structure for this. - Brya (talk) 06:17, 13 February 2016 (UTC)
Yes we should. If I remember right f.sp. is used by IF and MycoBank as a rank. Strongly related to this bots task is the question of Candidatus (Q857968). --Succu (talk) 19:18, 13 February 2016 (UTC)
Yes, forma specialis is used by IF and MycoBank as a rank, but that does not make it a rank. And, yes, "Candidatus" is a similar problem case. - Brya (talk) 09:55, 14 February 2016 (UTC)

Hkn-bot[edit]

Hkn-bot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: HakanIST (talkcontribslogs)

Task/s: clean up invalid authority property links in person items , harvest date of births from articles

Code: based on addwiki framework (php) currently being written

Function details: Bot will periodically check P2458 , P2446 , P2447 ,P2448 , P2449 if ids specified are valid by visiting the generated url, if not validated it will remove claims and will report the list. As errors often occur due to invalid data at source wiki or mismatch of footabller id with a manager id. Secondly bot will harvest birth dates items with these properties from imported wikis. Using variations of this wdq generated list , will add date of birth property if there is none.

-- Hakan·IST 18:50, 16 January 2016 (UTC)

Please make some test edits.--Ymblanter (talk) 08:18, 20 January 2016 (UTC)
@Ymblanter: : Ran the bot for the second task (harvesting day of birth from article) for 20 items, throttled to 10second per change. Hkn-bot contribs.-- Hakan·IST 15:54, 20 January 2016 (UTC)
I see that e.g. here you added data but they are unreferenced. Is there any way to add a reference as well?--Ymblanter (talk) 16:36, 20 January 2016 (UTC)
I've been working on adding references for sometime now, but have not got it to work yet.-- Hakan·IST 21:30, 20 January 2016 (UTC)
Your bot added a date of birth of "1 October 1987" for Q3801812, despite the itwiki page stating "10 gennaio 1987" (10 January 1987). I'm afraid a similar issue occurred on a few other items, for instance Q5889913 or Q6771150. --Alphos (talk) 22:37, 20 January 2016 (UTC)

RollBot[edit]

RollBot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Alphos (talkcontribslogs)

Task/s: Revert all edits made by users (crucial because of Quick Statements) or bots gone awry, once they've been temporarily blocked or stopped

Code: Github The bot is written in PHP, and I know it's not ideal, but it's a language I'm very familiar with.

Function details:

Whenever any user (including but not limited to bots) edits a big number of pages or entities in a short amount of time, and makes a mistake over every single one of them, reverting all pages to their former state is going to be rapidly mind-numbing for a human. We therefore need a bot to perform such an action.

A plain rollback is not a viable way of reverting for at least two reasons :

  • it will erase valid edits by the same user prior to going awry ;
  • it won't erase edits if any other user has edited the page or entity afterwards.

RollBot takes the first "wrong" edit as a parameter, and reverts all edits made by the same user since the time of that first "wrong" edit. The interface currently only resides on my computer - it is not possible to automatically start RollBot on a task : this is a security feature to avoid using it to vandalize legitimate edits.

It first lists all pages/entities edited by the target. Then for each of them, gets the content of the version immediately before first edit on that page by the target since it started going awry, and takes it as the version to revert to, should the page be reverted. Then establishes a list of all contributors since that version, to make sure no other users edited that page.

It can merely list the pages he couldn't revert without overwriting edits by users other than the target, OR overwrite said edits. This is decided on a per-request basis.

After completing its task, it publishes a complete report page in its userspace, with the name of the target and start time in the title.

That report holds a list of pages/entities it edited, a list of pages/entities requiring human check (all revisions by the target were deleted ; page was edited by another user ; page was edit-blocked ; those are the most likely explanations), and a list of pages/entities bound for deletion (created by the target after going awry - deleting could be performed by RollBot, but I'd rather make sure it has community acceptance for its base function before requesting sysop rights Face-wink.svg ). You can see an example of the report here, that was created by the bot in its current version.

Whether or not RollBot finally acquires sysop rights, I see it becoming a very useful tool for the admins' noticeboard.

Wikidata is the first WMF wiki where I implement it (although it technically has the ability to edit other wikis) because of the relatively limited userbase, and the relatively high edit/minute any user (and not just bots) can reach using external tools like Quick Statements. I do plan on running it on other wikis once it has proven its worth.

--Alphos (talk) 17:01, 14 January 2016 (UTC)

First series of test edits was short, and allowed me to spot one bug and one issue to address that cannot be considered a RollBot bug - I'll explain after properly investigating it, it seems some constraint was not met with the sitelinks in the "good version".
  • See this section of the admin's noticeboard.
  • RollBot successfully reverted all pages to their former state after being asked to do so - which resulted in a bunch of null edits, since they all already had been reverted to their former state.
  • Most of them except one were already back to their former state, so RollBot essentially performed a null edit. As expected, it successfully listed all pages and the editors (of a hardcoded for now limit of 1 + 5 who will have their nickname or IP address) it "reverted" - or intended to -, with the "first bad" revision, the "previous good" revision and its author.
Alphos (talk) 21:03, 18 January 2016 (UTC)
There are two very similar issues with :
addshore (IRC : Freenode / #wikidata) helped me greatly in isolating the two issues (well, he pretty much solved the whole thing by himself ^^' ). He also suggested (but this is unrelated to RollBot) we find a way to list all similar issues on Wikidata.
I don't know what happened with Q12189183, due to a STUPID mistake on my part : I overwrote my error log instead of appending to it. That is now fixed. I cannot reproduce the error, but, should another one ever arise, we'll definitely know about it ! : all errors occurring when attempting to edit will from now on be visible in RollBot's reports.
Alphos (talk) 23:09, 18 January 2016 (UTC)
On a suggestion by GZWDer, I implemented an optional end timestamp parameter.
After removing a tiny kink that I thought I never added, I started RollBot again (Report).
Despite having a minimum of 5 seconds of doing nothing after an edit (thus a very strict minimum of 5 seconds between edits, usually more), it sadly got throttled for quite a few attempted edits. I take comfort in the fact he successfully listed those failed attempts in the "Pages requiring human check" section, with a default explanation.
The real API error messages are in a file on my machine, and, although there is a lot of throttled edits, there is also a fair amount of failed saves due to wikilink conflict. It's usually because the page linked (in the entity RollBot is editing) is a redirect to a page linked in another entity.
I'll post a list of those in RollBot's report.
Alphos (talk) 18:29, 19 January 2016 (UTC)
Found 2 of those conflicting redirecting wikilinks :
The API simply prevents the bot from editing in case of such conflicts, there is no workaround that I know of. Good thing such conflicts are listed in the "Pages requiring human check" section of the bot's reports Face-wink.svg
Alphos (talk) 19:08, 19 January 2016 (UTC)
What is the current situation with the bot? Is it ready for approval?--Ymblanter (talk) 21:05, 28 January 2016 (UTC)
I've been facing a heavy bout of a fairly serious medical condition these past few days, it's a bit on hold.
It's given me the idea to give a group of users (most likely admins/sysops/whatchamacallits) the ability to trigger the bot when I'm in that state which may (and probably will) reoccur, but i'm sad to say dev is a bit on hold for the next few days - hopefully not more than a week (pleeeeaaaaase, I can't take this much longer !).
However, if I'm not mistaken, the bot in its current state is functioning as it should - if you don't count disability of its operator.
Alphos (talk) 11:34, 4 February 2016 (UTC)

@Alphos: In addition to the bot flag, which rights would this bot need? If administrator access is needed, then not only would you need a request for administrator access for the bot, but you yourself would have to run for adminship first.--Jasper Deng (talk) 09:28, 27 March 2016 (UTC)

Sorry for the late reply, been suffering from Q166907 for the past few months, which made programming a bit difficult - although I luckily did have the energy to make RollBot during an easier week -, and my treatment needed a few tweaks in the past few weeks - I'm not there yet, but hopefully I should soon be able to resume work on the bot and other projects !
I'm not aiming for bot adminship yet. The bot needs an extensive period of testing - it seems to behave adequately for now, barring the unconfirmed user limits it inadvertently hit, but I'm a prudent person, and I'd rather be damn sure it won't add work for the current human admins by performing unneeded sysop actions Face-wink.svg
If however, I'm satisfied (and I'm a hard person to satisfy) with its ability to perform the tasks in their entirety, including deleting items based on the conditions given to it, I plan indeed on requesting sysop rights. If that means I need to be an admin first, I'd apply as well of course.
Thanks a lot for your consideration Face-smile.svg
Alphos (talk) 13:40, 6 April 2016 (UTC)

Dexbot[edit]

Dexbot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Ladsgroup (talkcontribslogs)

Task/s: Auto-transliterating for names of humans

Code: Based on pwb, probably publish it soon.

Function details: The codes analyses dumps of Wikidata and can create an auto-transliterating system for any given pair of languages based on that. I started with Persian and Hebrew (some edits for test [2] [3]) --Amir (talk) 18:14, 7 April 2015 (UTC)

  • Pictogram voting comment.svg Comment, please let me know when you try your system for some cyrillic language. I'd like to see it myself. --Infovarius (talk) 14:10, 8 April 2015 (UTC)
@Infovarius: I work in pair of languages like fa and he (which the bot adds Persian transliteration based on Hebrew and vice versa) which pair of language do you suggest? en and ru? Amir (talk) 11:54, 9 April 2015 (UTC)
Probably you should have stated this in your request. Your phrase "I started with" has encouraged me :) No, I don't suggest Russian as I understand the complexity of the task. --Infovarius (talk) 13:16, 10 April 2015 (UTC)
@Infovarius: I don't think Russian is too complicated to abandon. I took care of lots of different issues including country of citizenship, etc. so It's not hard for this bot. I asked you what language do think is the best pair for Russian *to start with* Amir (talk) 21:11, 10 April 2015 (UTC)
Will the bot be able to dedect delicate labels as in King An of Han (Q387311)? --Pasleim (talk) 19:24, 13 April 2015 (UTC)
It probably skips them or make a correct transliteration (depends on the language) but I can't say for sure. Let me test Amir (talk) 13:33, 15 April 2015 (UTC)
Are we ready for approval here?--Ymblanter (talk) 16:08, 15 April 2015 (UTC)
  • Just a caveat when when dealing with Chinese languages: Chinese to Latin script (and vice versa) transliterations are rarely standardized. For example, Alan Turing's given name might be transliterated into 艾伦 or 阿兰 (as in the case of Alan Moore (Q205739)) or 亚伦 (as in the case of Alan Arkin (Q108283)). These Chinese characters are roughly resembles "Alan" when pronounced, but due to regional differences (i.e. mainland China, Taiwan, Hong Kong, etc), they result in different transliterations. Even when two people's names are transliterated by the same region, they can be different. There is simply no standardization on this matter. —Wylve (talk) 14:53, 23 April 2015 (UTC)
    hmm, User:Wylve: Just a question: Is it wrong to put "亚伦" for Alan in Alan Turing? Amir (talk) 12:36, 25 April 2015 (UTC)
    It's not wrong, but it might not be the only way people call Alan Turing in Chinese. The lead sentence of Turing's article on zhwiki mentions that "Alan" is also transliterated as 阿兰. —Wylve (talk) 20:48, 25 April 2015 (UTC)
    @Wylve: I made 50 auto-transliterations [4], please check and say if anything is wrong or unusual. Thanks Amir (talk) 20:05, 16 May 2015 (UTC)
    I can't verify every name, since some of those people aren't mentioned in Chinese news sources. My standard of what is "wrong" or "unusual" is whether the transliterations you've produced are used predominantly in reliable and reputable sources. It is hard to judge sometimes, as there is a variety of transliterations used. For instance:
  • Jonathan Ross is transliterated as 强纳·森罗斯 and also 喬納森·羅斯
  • Leonard B. Jordan is also transliterated as 萊昂納德·B·喬丹
  • Jimmy Bennett is also transliterated as 吉米·本内特, 吉米班奈, 吉米班奈特.
  • Jason Lee is also named 杰森·李.
  • "Scott" from A. O. Scoot is also transliterated as 史考特.
  • All of your edits should be fine if read in Chinese, as they all sound like their English name. Also, I have found this page ([5]), which documents Xinhua News Agency (Q204839)'s official transliterations of names. These transliterations are considered official only in Mainland China. —Wylve (talk) 21:58, 16 May 2015 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Ladsgroup, Wylve: Does this look okay for an approval, or is there something we're missing? I don't speak (or read, for that matter) Chinese  Hazard SJ  05:40, 28 December 2015 (UTC)

KunMilanoRobot[edit]

KunMilanoRobot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Kvardek du (talkcontribslogs)

Task/s:

  • Add french 'intercommunalités' on french communes items (example)
  • Add french communes population
  • Correct Insee codes of french communes

Code:

Function details: Takes the name of the 'communauté de communes' in the Insee base and adds it if necessary to the item, with point in time and source. Uses pywikipedia. --Kvardek du (talk) 19:27, 21 January 2014 (UTC)

Imo the point in time qualifier isn't valid here as the propriety isn't time specific. -- Bene* talk 15:10, 22 January 2014 (UTC)
Property:P585 says "time and date something took place, existed or a statement was true", and we only know the data was true at January 1st, due to numerous changes in French organization. Kvardek du (talk) 12:18, 24 January 2014 (UTC)
Interesting, some comments:
  • Not sure that "intercommunalités" are really aministrative divisions (they are built from the bottom rather than from the top). part of (P361) might be more appropriate than located in the administrative territorial entity (P131)
  • Populations are clearly needed but I think we should try do it well from the start and that is not easy. That seems to require a separate discussion.
  • INSEE code correction seems to be fine.
  • Ideally, the date qualifiers to be used for intercommunalité membership would be start time (P580) and end time (P582) but I can't find any usable file providing this for the whole country. --Zolo (talk) 06:37, 2 February 2014 (UTC)
Kvardek du : can you add « canton » and « pays » too ? (canton is a bit complicated since some cantons contains only fraction of communes)
Cdlt, VIGNERON (talk) 14:01, 4 February 2014 (UTC)
Wikipedia is not very precise about administrative divisions (w:fr:Administration territoriale). Where are the limits between part of (P361), located on terrain feature (P706) and located in the administrative territorial entity (P131) ?
Where is the appropriate place for a discussion about population ?
VIGNERON : I corrected Insee codes, except for the islands : the same problem exists on around 50 articles due to confusion between articles and communes on some Wikipedias (I think).
Kvardek du (talk) 22:26, 7 February 2014 (UTC)
@Bene*, Vogone, Legoktm, Ymblanter, The Anonymouse: Any 'crat to comment?--GZWDer (talk) 14:37, 25 February 2014 (UTC)
I'm still not familiar with the "point in time" qualifier. What about "start date" because you mentioned the system has changed to the beginning of this year? Otherwise it might be understood as "this is only true/happened on" some date. -- Bene* talk 21:04, 25 February 2014 (UTC)
Property retrieved (P813) is for the date the information was accessed and is used as part of a source reference. point in time (P585) is for something that happened at one instance. It is not appropriate for these entities which endure over a period of time. Use start time (P580) and end time (P582) if you know the start and end dates. Filceolaire (talk) 21:19, 25 March 2014 (UTC)

Symbol support vote.svg Support if the bot user uses start time (P580) and end time (P582) instead of point in time (P585) --Pasleim (talk) 16:48, 28 September 2014 (UTC)

@Kvardek du: Do you still plan to run the bot? If so, could you please do agian some test edits with the use of start time (P580), end time (P582) instead of point in time (P585)? --Pasleim (talk) 07:52, 24 May 2015 (UTC)
@Pasleim: : it's planned, but not for the moment... The problem I have with french data is that you only have the membreship at a moment t, and not with a start time (P580). Kvardek du (talk) 13:20, 25 May 2015 (UTC)
Kvardek du then use retrieved (P813) in the reference and leave out start time (P580) and point in time (P585). Joe Filceolaire (talk) 08:33, 23 July 2015 (UTC)
Filceolaire : yeah but I have a retrieved (P813) t2 which is different from my point in time (P585)... Kvardek du (talk) 15:47, 24 July 2015 (UTC)
If you don't know the 'start time' then leave it out. If you want then you can create a separate item for the document that the data comes from and add the point in time statement to that item then reference the item for that document in the references for the 'located in ... entity' statements. Look on it as the 'point in time' date relates to the info in the document (true on that date).
Note that population figures should have a 'point in time' qualifier to say when that population figure applies since the population figure is not true for a period; it is only true for the day it was measured. Joe Filceolaire (talk) 00:55, 25 July 2015 (UTC)