Wikidata:Requests for permissions/Bot

From Wikidata
Jump to: navigation, search
Shortcuts: WD:RFBOT, WD:BRFA, WD:RFP/BOT
Wikidata:Requests for permissions/Bot
To request a bot flag, or approval for a new task, in accordance with the bot approval process, please input your bot's name into the box below, followed by the task number if your bot is already approved for other tasks.


Old requests go to the archive.

Once consensus is obtained in favor of granting the botflag, please post requests at the bureaucrats' noticeboard.

Bot Name Request created Last editor Last edited
SaschaBot_2 2015-03-26, 16:28:44 Jura1 2015-03-27, 05:47:22
SaschaBot_1 2015-03-20, 19:44:30 Jura1 2015-03-27, 05:49:23
DBpedia-mapper-bot 2015-03-12, 17:28:56 Hjfocs 2015-03-25, 10:28:57
WikiGrok 2015-03-16, 23:27:31 Kaldari 2015-03-31, 20:10:37
CaliburnBOT 2015-02-20, 12:21:44 Caliburn 2015-03-30, 15:02:05
Revibot 3 2014-12-07, 12:16:54 Pasleim 2015-03-30, 21:34:41
Shyde 2014-11-29, 15:09:36 GZWDer 2015-02-12, 04:19:18
JhealdBot 2014-09-07, 23:30:46 Jheald 2014-11-22, 22:39:39
BthBasketbot 2014-06-10, 08:17:14 Bthfan 2014-08-11, 14:27:02
Fatemibot 2014-04-25, 08:59:32 Ymblanter 2014-09-19, 12:38:16
ValterVBot 12 2014-04-11, 19:12:34 Multichill 2014-11-22, 17:00:26
Structor 2014-04-09, 15:50:38 Ymblanter 2014-10-15, 06:44:08
Global Economic Map Bot 2014-01-26, 21:42:37 Ladsgroup 2014-06-17, 14:02:29
KunMilanoRobot 2014-01-21, 19:27:44 Pasleim 2014-09-28, 16:48:55

SaschaBot 2[edit]

SaschaBot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Sascha (talkcontribslogs)

Task/s: Add missing common names to Wikidata.

Function details: To mine for missing common names, I went over all humans in Wikidata, extracted the first part of their English label (eg., Simone de BeauvoirSimone), and matched this against a list of all common names in Wikidata. Here is the result: List of common names that seem to be missing from Wikidata.

It should be easy to mine the gender of these common names. I think this would be best done in a later, separate pass, since this could then also check the gender of existing common names in Wikidata. After that step, another bot run would create descriptions (such as Female common name) for all common names in Wikidata that don't have descriptions yet.

Impact: If the bot gets permission to run, it would create 108636 items. If we insert missing common names only if there's at least 2 people with that name, the bot would create 30893 items. If we restrict to names with at least 3 people, it would create 18361 items.

Caveats: If you look at the list, you will see a couple of bogus entries. Some are not in the Latin script, or contain funny characters like :. I will make sure that these do not get inserted, but wanted to start the discussion already now. However, there are also some entries that would not be detectable by a script, such as Empress. What should we do about those? Is there a good tool so that others could help reviewing the list? (I've made the spreadsheet world-editable on Google Docs).

--Sascha (talk) 16:28, 26 March 2015 (UTC)

Hey Sascha. The list is at the moment a mix of first names, last names (e.g. Li), pseudonyms (e.g. Seven) and other entries (e.g. Saint, K). To set proper descriptions and for later usage it is however important that the type of name is known. Do you see a way to figure out the type automatically or should all entries be reviewed by a human? --Pasleim (talk) 17:52, 26 March 2015 (UTC)
Notify User:Jura1 – the name expert in Wikidata --Pasleim (talk) 18:03, 26 March 2015 (UTC)
  • Good idea. I had thought about doing that at some point as well, but I'm glad it's being taken up.
    How about checking the names against some of the lists at Wikipedia? Special:Search/list of given names helps find some.
    WikiProject Names describes how to structure the items.
    To avoid problems, I usually leave out given names that are not first names (Chinese, Korean, Japanese, Hungarian).
    This list provides most existing first names. --- Jura 20:34, 26 March 2015 (UTC)
    BTW, I couldn't resist and created Phil (Q19685923). --- Jura 05:47, 27 March 2015 (UTC)

SaschaBot 1[edit]

SaschaBot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Sascha (talkcontribslogs)

Task/s: To remove disambiguation suffixes from Wikidata labels about common names. See diff for one sample edit, and a handful edits by this bot.

Function details: For every Wikidata item that is an instance of common name, the bot goes over the labels in each language. If the label has a suffix in parentheses, it extracts the suffix, and checks it against the whitelist below. If the label suffix is not whitelisted, the bot leaves the label unchanged. Otherwise, the bot strips off the label suffix, and checks whether the item already has a description in the current language. If so, the description is left unchanged. Otherwise, if the item has no description yet in the given language, the bot takes the stripped-off suffix and adds that string as item description.

Impact: If the bot gets permission to run, it will change 1335 labels.

Motivation: I would like to experiment with algorithmic transliteration of entity names, using the transforms in Unicode CLDR. From past experience, I don't think that the actual transliteration can be done fully algorithmically because the quality would be too low. Instead, I imagine a setup where a script will generate candidate labels, and human native speakers would confirm/improve each suggested edit. However, to get there, a first step is fixing the labels of common names in Wikidata so I have a clean data set to work with. In a later step, I'd like to run a similar script on surnames.

Here is the whitelist of suffixes that the bot is stripping away. I have considered special-casing "disambiguation" etc., but believe that cleaning up the descriptions would be easier in a second swoop over the items.

Given name
Naam
Nahme
Name
Numm
Nåmen
Patronyme
Virnumm
Vorname
Yutarō
anthroponym
apartigilo
apellido
cognome
cognomen
desambiguación
desambiguação
dezambiguizare
disambiguasi
disambiguation
discretiva
doorverwijspagina
drengenavn
eesnimi
egyértelműsítő lap
eiginnafn
etunimi
fornavn
förnamn
given name
gmina
homonymie
homónimos
ime
imię
izena
jméno
keresztnév
kvinnenavn
křestní jméno
mannsnafn
meno
mjeno
naam
nafn
nama
nama kecil
name
namn
navn
nimi
nom
nombre
nombre propio
nome
nomen
nomo
nomu
nume
nume feminin
név
osebno ime
pigenavn
prenom
prenome
prvé meno
prénom
příjmení
surname
voornaam
žensko ime
Ги
שם
人名
ім'я
име
имя
лӱм
імя
όνομα
мъжко име
значения
значення
осетинское имя

--Sascha (talk) 19:43, 20 March 2015 (UTC)

Comments, anybody?--Ymblanter (talk) 09:19, 25 March 2015 (UTC)
Unless anyone raises any concerns within the next 3 days, I would like to approve this task, it looks fine to me. Vogone (talk) 15:30, 25 March 2015 (UTC)
Symbol support vote.svg Support to strip away the suffixes. To add descriptions it should be considered that not disambiguation should be added but Wikimedia disambiguation page, i.e. the label from Wikimedia disambiguation page (Q4167410). --Pasleim (talk) 18:00, 26 March 2015 (UTC)
I think it's a good idea to strip the suffixes from labels if they happen to have been included there. Depending on the type of items, descriptions in English should be either "name", "family name", "Wikimedia disambiguation page", or for first names ("given name", "female given name", "male given name"), but not a random one. For Q2781139, it should be "male given name". It might be easier to sort these out manually. --- Jura 05:49, 27 March 2015 (UTC)

Popcornbot 3[edit]

Popcornbot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Popcorndude (talkcontribslogs)

Task/s: Adding descriptions to disambiguation pages.

Code: User:Popcorndude/botcode3

Function details: Adding the labels of Wikimedia disambiguation page (Q4167410) as descriptions for every item in Wikipedia:Category:All_disambiguation_pages lacking descriptions (and possibly adding instance of (P31)). This is essentially the same as Wikidata:Requests_for_permissions/Bot/Popcornbot, except that it would be adding descriptions to disambiguation pages rather than categories. --Popcorndude (talk) 20:46, 15 March 2015 (UTC)

Symbol support vote.svg Support but I'm not sure that is correct to use all the label of Wikimedia disambiguation page (Q4167410) as description. Normally we must use "Wikimedia" not "Wikipedia" disambiguation page. Maybe is better if you select only description with "Wikimedia". For the conflicts do you think to generate a report? --ValterVB (talk) 18:33, 18 March 2015 (UTC)
What do you mean by conflicts? Certainly I can use a different alias, though in this case I am only applying it to disambiguation pages from English Wikipedia, if that makes any difference. Popcorndude (talk) 12:24, 19 March 2015 (UTC)
Conflict = Alredy exist item with the same label and the same descirptions. Ex. If I want add it label = "Perraudin" on Q19298367 I have a conflit with Q3375565: «Q3375565 already has label "Perraudin" associated with language code it, using the same description text."» --ValterVB (talk) 13:03, 19 March 2015 (UTC)
Sorry for taking so long to respond. I could certainly make a list of items that cause errors, though I might not be able to easily identify why the caused errors. Popcorndude (talk) 23:41, 25 March 2015 (UTC)
No problem, I don't want complicate your life :) For me you can start. --ValterVB (talk) 17:48, 26 March 2015 (UTC)--ValterVB (talk) 17:48, 26 March 2015 (UTC)
I'm not sure if you should take all labels of Wikimedia disambiguation page (Q4167410). There is for example the ksh-label „Wat-eß-dat?“-Sigg en de Wikkipeidija, or many labels with a colon. Before starting, you may ask some people to check the labels in their native language. --Pasleim (talk) 18:14, 26 March 2015 (UTC)
Where can I do this? I probably should have done so on Wikidata:Requests_for_permissions/Bot/Popcornbot too. Popcorndude (talk) 22:15, 26 March 2015 (UTC)
You can try it with a comment in the WD:Project chat or ask people who are around in the irc chat. I can tell you that the labels in de, en, fr and gsw are fine. --Pasleim (talk) 22:58, 26 March 2015 (UTC)
I have modified the code to record any pages which cause errors or have more than 1 instance of (P31) (these will not be edited). Popcorndude (talk) 12:04, 27 March 2015 (UTC)

DBpedia-mapper-bot[edit]

DBpedia-mapper-bot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Hjfocs (talkcontribslogs)

Task/s: Addition of Wikidata-to-DBpedia classes/properties mappings, as discussed in this project chat thread.

Code: User:Hjfocs/add_dbpedia_mapping.py, currently working for a single (Wikidata, DBpedia) mapping pair. If this request is approved, it will scale to all the available mappings.

Function details:
For each (Wikidata, DBpedia) mapping pair, the bot adds the following data:

  1. an equivalency claim to a Wikidata Item describing a class or a property in the Wikidata classification schema (AKA ontology). The claim maps to a DBpedia ontology item (see http://mappings.dbpedia.org/server/ontology/);
  2. a qualifier pointing to a human-readable description of the DBpedia ontology item;
  3. a reference stating that the claim was imported from DBpedia.

--Hjfocs (talk) 17:27, 12 March 2015 (UTC)

Any comments here? If not, I will approve in a couple of days.--Ymblanter (talk) 10:24, 14 March 2015 (UTC)

Reopening, after approval by Ymblanter. --Atlasowa (talk) 09:19, 17 March 2015 (UTC)

Oppose for now:

This request needs more explanation and deliberation, ping Hjfocs. --Atlasowa (talk) 09:19, 17 March 2015 (UTC)


Hi Atlasowa,
Thanks for the feedback.
Let me first highlight what I think is the key point:
This bot is not intended to provide an import facility for third-party data, but only a linkage one.
This has 2 benefits:
  1. Provenance information is kept intact: users can simply check how similar fragments of knowledge are described in different knowledge bases, by browsing through the links. This holds both for humans and machines (since the data is machine-readable);
  2. No need to merge different data models.
Here you can find detailed answers for each question:
  • No indication, how many edits will be performed. scale? "all"?
The edits will only affect the schema Items, so the number will not be big. Currently, we have a total of 114 classes and properties mappings as per the DBpedia mappings wiki, plus 688 classes and 335 properties mappings as per this spreadsheet.
  • No pairing of properties provided for checking.
You can have a look at the referenced DBpedia mappings wiki, which lists the mappings that are already in production.
The referenced spreadsheet content is scheduled to be added to both the DBpedia mappings wiki and Wikidata.
  • No test edits done (100 test edits are customary)
Actually, I have been testing on the Alternative Sandbox Item, by adding a few claims. I don't think I will need many more test edits (certainly not 100).
  • What is the point of this pairing, how will this help wikidata?
Linking to third-party knowledge bases like DBpedia (which contains lots of statements that are not in Wikidata and links to other datasets) facilitates the reuse and consumption of further data, without having to import them into Wikidata. Cf. the key benefits.
  • Is it a preliminary step for other edits/imports/projects?
No, I think it is a standalone action.
  • Earlier proposals for DBpedia imports have been abandoned
I have no knowledge about this, I fear that the DBpedia community was not directly involved into those discussions.
  • If this property mapping is useful, why isn't it done on DBpedia? [2]: "We also fully extract wikidata property pages. However, for now we don’t apply any mappings to wikidata properties." If it's not done on DBpedia, why should it be added to wikidata?
You are referring to an internal project (for which we are looking for feedback from the Wikidata community, that's why it was posted there), which aims at a full integration of Wikidata into DBpedia.
The property mapping in DBpedia is already in production, cf. my reply above.
--Hjfocs (talk) 12:19, 17 March 2015 (UTC)
Hi Hjfocs, thanks for answering. Can you try to give a really precise answer to the question of how many edits/mappings ?
  • "If this request is approved, it will scale to all the available mappings."
  • "Currently, we have a total of 114 classes and properties mappings as per the DBpedia mappings wiki, plus 688 classes and 335 properties mappings as per this spreadsheet."
    • Do you want to do 114 classes and properties mappings?
    • Do you want to do 114 classes and properties mappings plus 688 classes and 335 properties mappings as per google spreadsheet?
    • Do you want to do 114 classes and properties mappings plus classes and properties mappings as per google spreadsheet, minus those that have been classified wrong mapping or uncertain mapping?
Can you give the number of mappings you want to do? --Atlasowa (talk) 14:10, 17 March 2015 (UTC)
Sure, Atlasowa!
The best case scenario would be to use all of them, so the bot will perform at most 114 official mappings + 688 draft classes mappings + 335 draft properties mappings = 1,137 edits.
As you noticed, however, the entries in the spreadsheet are still partially validated, so I will need extra pairs of eyes.
I believe they will come from the 2 communities, as I plan to upload them both in the DBpedia mappings wiki and in Wikidata. Of course, I will personally double-check them before that.
--Hjfocs (talk) 14:30, 17 March 2015 (UTC)


The details of what is being done are not clear to me. Can you explain why "a reference stating that the claim was imported from DBpedia" is true? A statement that two entries in two different databases agree with each other is different from an entry in DBpedia being imported into WikiData. Also, is there an explanation of what test you do to decide if two entries are equivalent? Jc3s5h (talk) 15:23, 17 March 2015 (UTC)

Hi Jc3s5h,
  1. Since the mapping originates from a DBpedia community effort, I thought that the imported from property would best fit. Do you have any suggestions for a better alternative?
  2. The procedure to mint a new mapping pair combines the following automatic techniques (in order of complexity):
  • String similarity measures (i.e., exact match, Levenshtein distance match);
  • String kernel matching;
  • Logical constraint check (i.e., domain and range);
  • Instance distribution similarity;
  • SVM-based matching, with features such as labels or aliases.
Then, the results need at least a round of human validation, and are finally considered official.
--Hjfocs (talk) 16:10, 17 March 2015 (UTC)
Thanks. My impression is DBpedia created a class or property, and Wikidata independently created a class or property, and the effort you are involved with has discovered that certain classes or properties in the two databases are equivalent. Since the things were created independently, there is no importation involved. Jc3s5h (talk) 20:44, 17 March 2015 (UTC)
This is a great synthesis, Jc3s5h! You pointed out the crucial aspects, thanks!
The bot will perform a schema alignment task. --Hjfocs (talk) 09:23, 18 March 2015 (UTC)
For those linkages of DBpedia and Wikidata, where a mapping is already in production on DBpedia (the 114 classes and properties mappings), it would be appropriate to add "imported from" "DBpedia" (+ ideally "as of date"). @Jc3s5h, Hjfocs: Agreed? --Atlasowa (talk) 11:16, 18 March 2015 (UTC)
Sure, I totally agree. Also, the bot's behavior will be updated, in order to handle the date stamp of the claim. I would add a qualifier with property == point in time and value == date stamp, like in the population of Berlin. Do you agree, Atlasowa? --Hjfocs (talk) 11:50, 18 March 2015 (UTC)
Hi Hjfocs, your test edit looks good to me. But i would welcome more feedback on referencing as "imported from" vs. "stated in" by more competent users. --Atlasowa (talk) 10:24, 19 March 2015 (UTC)
I would suggest to only add mappings that are "already in production" at DBpedia. Further mappings should not be "wrong" or "uncertain". ;-)
Some further links that might be useful:
HTH, --Atlasowa (talk) 10:24, 19 March 2015 (UTC)
Thanks for the pointers Atlasowa, they are really useful.
Agreed WRT the automatically generated mappings: they still need human validation, and this will be done first on the DBpedia community side. Then, I can propose the linkage to Wikidata.
Looking forward to getting more feedback on which property best fits the reference. --Hjfocs (talk) 17:54, 19 March 2015 (UTC)
What is the current situation here?--Ymblanter (talk) 09:19, 25 March 2015 (UTC)
I'm waiting for feedback on which property to use for referencing. If no one objects, I will proceed with imported from. --Hjfocs (talk) 10:28, 25 March 2015 (UTC)

WikiGrok[edit]

WikiGrok (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Bmansurov (WMF) (talkcontribslogs) or Ryan Kaldari (WMF) (talkcontribslogs)

Task/s: Add aggregated claim data from WikiGrok

Code: Bot code is here. Extension code is here.

Function details: As I've mentioned on Wikidata project chat, the WMF mobile web team has been experimenting with micro-contribution interfaces for adding metadata to Wikidata from within Wikipedia articles. These experimental interfaces are very similar to Magnus's Wikidata game, but instead of posting the results to Wikidata immediately, we have been collecting the results in a database so that they can be aggregated for better accuracy. Now that we have collected a large number of responses, we would like to try posting some of the aggregated data to Wikidata (less than 1000 edits) and get community feedback on the quality and usefulness of the data. This will help us to tune the feature in preparation for continuous larger-scale use in the future (which is not part of this bot request). You can see some of the results from our tests so far at meta:Research:WikiGrok. You can also view a sample of the first 100 edits that would be made by this bot. These edits will be to add claims of the following types:

  • occupation: author
  • occupation: film actor
  • occupation: television actor
  • instance of: live album
  • instance of: studio album

In addition, in the cases where we are adding "instance of: live album" or "instance of: studio album", we will delete any existing "instance of: album" claims (as discussed at Project Chat). We will also avoid creating any duplicates of existing claims. --Ryan Kaldari (WMF) (talk) 23:28, 16 March 2015 (UTC)

WikiGrok version B: tagging. "You just contributed to Wikipedia, thanks!" is wrong, the contribution is supposed to be pushed to Wikidata.

For reference:

Some comments:

  • Scale? "Now that we have collected a large number of responses" - You give no number of claims to be pushed to wikidata except "100 claims (as a test run)". Are the 100 test edits the selected best-of a "large number of responses"? Or a random sample? How many would follow after the test? ([5] 7199 unique pages had at least one version (a) played on them / This number is 9013 for version (b)...?)
    • We currently have ~36,000 responses. The vast majority of those are not yet usable since they have not been corroborated by multiple people. Depending on how we do the aggregation, we could be looking at anywhere from about ~200 edits to ~1000 edits total for the current data set. That number depends on what values we set for the two threshold variables (number of responses for the item, and percentage agreement), which in turn depends on the feedback we get regarding the first 100 edits. The first 100 edits will be a random selection using a relatively conservative set of threshold values: >=5 responses, >=80% agreement. Those threshold values give us 602 vetted claims total. If the community was satisfied that the first 100 edits are high quality, we would then ask to post the remaining 502. If the community was not satisfied that the first 100 edits were high quality, we would ask to try again with a more conservative set of threshold values. We can create a separate bot request for potential higher-volume use in the future (which may or may not ever happen). Ryan Kaldari (WMF) (talk) 23:42, 18 March 2015 (UTC)
  • References? No references afaics. Will the "continuous larger-scale use in the future" also push data without any source into Wikidata? Help:Sources: "The majority of statements on Wikidata should be verifiable insofar as they are supported by referenceable sources of information such as a book, scientific publication, or newspaper article."
    • It is correct that none of the edits will include references. We evaluated methods for supporting the addition of references from the interface, but we were not able to come up with any practical solutions for a small mobile interface (other than using "XX Wikipedia" which isn't a real reference). I know this isn't ideal and we are interested in hearing the community's feedback on this issue. Ryan Kaldari (WMF) (talk) 18:36, 17 March 2015 (UTC)
  • Claims generation? Wikipedia readers are asked to decide Yes/No/Don't know to proposed claims (for wikidata) on the Wikipedia article page. The proposed claims seem to come from wdq.wmflabs.org. Just like the Wikidata Game https://tools.wmflabs.org/wikidata-game/#mode=occupation (where users need to log in)?
    • The initial prototype of WikiGrok used the same engine as Wikidata Game for generating potential claims, but we now have our own engine that is completely separate and does not rely on Tool Labs. Ryan Kaldari (WMF) (talk) 18:36, 17 March 2015 (UTC)
  • Data quality? mw:Extension:MobileFrontend/WikiGrok#Pushing_responses_to_Wikidata: "Before we start pushing response data to Wikidata, we will be analyzing the quality of the responses along different axes: logged-in vs. anon, high edit count vs. low edit count, version A interface vs. version B interface, etc. We will then use that data to figure out how to maximize the quality of the data we push to Wikidata via a scoring system. For example, we could implement an algorithm that selects responses to push to Wikidata as so:(...) If number of responses > 1, and composite response score > 65%, then push response to Wikidata." Where is this analysis? Is your "aggregated data" from WikiGrok indeed from >1 responses?
    • I haven't been that involved in the analytics aspect of the project, but here's the gist of what I know so far... First, we don't have a ton of data yet, so our conclusions are not bullet-proof. The results of this bot run will be an important part of the over-all analysis of how or if we move forward with the feature. Our existing analytics data is a mix of comparison with existing Wikidata claims and hand-checking (neither of which are perfect). So far, we have been pleasantly surprised by the overall quality of the results. Especially surprising is that we haven't seen any real difference in response quality from logged-in vs. anon users. I'll ping our analytics person and ask them to make sure that all our results so far are posted on the test pages. Ryan Kaldari (WMF) (talk) 18:36, 17 March 2015 (UTC)

mw:Extension:MobileFrontend/WikiGrok#Response_Analysis gives curious definitions:

  • "Quality: For the questions we do not know the ground truth an aggregated response is considered high quality if the submitted response to WikiData is not reverted two weeks after the data is written to WikiData. The aggregated response is considered low quality otherwise."
So, someone, supposedly a human someone, is supposed to check all the WikiGrok bot edits for accuracy in two weeks, but if nobody cares to clean up after the bot, then the WikiGrok data is considered "high quality"... uhm, totally quality, sure ^^. But there is still the "ground truth", wow, what is that?
"Ground truth" is based on comparison with existing Wikidata claims. We fully realize this is an imperfect measure, which is why we are also doing analysis based on hand-coding of results (at a small scale), and analysis based on test submissions to Wikidata. We will be asking the Wikidata community on Project Chat to scrutinize all the edits from this bot, so we're hoping that at least a significant percentage of the edits will be looked at within 2 weeks. If you feel like that isn't realistic, let us know. Ryan Kaldari (WMF) (talk) 18:36, 17 March 2015 (UTC)
  • "Percentage agreement with WikiData: For each question, percentage agreement with WikiData is the fraction of responses for that question that match the ground turth as available in WikiData. This measure is defined only when the ground truth exists."
So, Wikidata = ground truth... (or "turth" ;-) uhm. Do you check where this "wikidata turth" came from? Via Widar from the same Wikidata Game? Or via bot imported from person data at (italian?) Wikipedia? Or manual edits by registered users? Or: Doesn't mattter because if it's on wikidata it's true?^^
No, we do not check where the existing claims come from and you're right that there's a chance our claims will actually be more reliable than the existing claims. We kind of have a chicken-and-egg problem in this regard. We've discussed the idea of just using WikiGrok to verify existing Wikidata claims, rather than creating new claims, and I would be interested to hear your thoughts on this idea. Ryan Kaldari (WMF) (talk) 18:36, 17 March 2015 (UTC)

I'm starting to wonder if it wouldn't be better to look at WikiGrok data disagreeing with "wikidata turth". Is it available? --Atlasowa (talk) 15:07, 17 March 2015 (UTC)

The existing WikiGrok data isn't currently publicly available, but I don't see why it couldn't be. I'll ask the project manager if it would be feasible to post the data somewhere. Thanks for your detailed response to the bot request. We are very interested in community feedback on this feature and want to make sure that it is something that the Wikidata community actually thinks is useful to the project. I know there are mixed feeling about the impact of tools like Widar and Wikidata Game, but we are hoping to improve on those models and strike a good balance of volume and quality. One advantage that we have over those tools is that we have a huge audience (all visitors to mobile Wikipedia), so we can potentially have both higher quality (through aggregation) and higher volume (in the long run). The feature is still considered an experiment, however, and will be killed if it doesn't prove to be useful. Ryan Kaldari (WMF) (talk) 18:36, 17 March 2015 (UTC)
@Atlasowa: I've posted an anonymized dump of the existing test data on github. Ryan Kaldari (WMF) (talk) 22:14, 30 March 2015 (UTC)

Thanks for your response, Ryan Kaldari (WMF). And thank you especially for being so upfront about potential problems. BTW, i primarily selfidentify as a Wikipedian. Where do i start?

  • "We've discussed the idea of just using WikiGrok to verify existing Wikidata claims, rather than creating new claims, and I would be interested to hear your thoughts on this idea." Yes, sounds good, but using WikiGrok to "verify"? If we have no sources supporting the WikiGrok correction, we just have 2 contradicting claims: 1 old wikidata "truth", 1 new WikiGrok claim. To decide for the one or the other, someone has to investigate the claim. And if no reference is added, this investigation will have to start anew everytime: when this claim is changed next (vandalised?), or appears in a list of mismatches from cross-checking databases. And this investigation again and again and again.
    • @Atlasowa: That's true, and one of the reasons we've been reluctant to pursue changing or deleting existing claims. We mainly just want to help with filling in the low-hanging fruit of missing claims that can get unanimous or near-unanimous support for being added. Like the early days of Wikipedia, most of the current work on Wikidata is simply around filling in "common knowledge" (which, of course, is imperfect). Once there is a broad coverage of "common knowledge", I'm sure there will be more emphasis on adding references, qualifiers, ranks, and settling disputed information. As a Wikipedian myself it took me a while to get used to this difference, but the Wikidata community as a whole seems to be OK with it. I've added over 1000 claims myself with my Wikidata community account, and so far none of them have been reverted for lacking references. Ryan Kaldari (WMF) (talk) 23:24, 19 March 2015 (UTC)
  • "We will be asking the Wikidata community on Project Chat to scrutinize all the edits from this bot, so we're hoping that at least a significant percentage of the edits will be looked at within 2 weeks. If you feel like that isn't realistic, let us know." Asking at a prominent place for handcoding a list of 100 edits is a completely different thing than having a bot do 100 live edits and expecting "someone" to check this in 2 weeks. The last feels to me like throwing a handful of sand in the ocean. The first seems realistic, for a couple of times, at that scale (not "potential higher-volume use in the future").
    • That sounds like a decent idea, although a bit of a departure from the normal bot approval process. Is there really that much of a difference between asking people to look at a list of 100 possible changes and 100 actual changes? The advantage of using actual changes is that we can automatically measure how many of those 100 claims still exist after 2 weeks (or whatever time-frame). Analyzing a haphazard discussion of an on-wiki list is much less conducive to generating quantifiable results. Also, keep in mind that all of the edit summaries will link to an FAQ page where people can give more detailed feedback. FWIW, we've done some limited hand-coding of the results ourselves and are pretty happy with the results. If anything, they should at least be more reliable than the Wikidata game. Ryan Kaldari (WMF) (talk) 23:24, 19 March 2015 (UTC)
    • I went ahead and posted the first 100 potential edits: Wikidata:Requests for permissions/Bot/WikiGrok/First edits. We ended up deciding on a slightly more conservative set of threshold values to start with, specifically 10 responses instead of 5. If those look good, we can always lower it for future test pushes. Ryan Kaldari (WMF) (talk) 22:23, 20 March 2015 (UTC)
  • "We evaluated methods for supporting the addition of references from the interface, but we were not able to come up with any practical solutions for a small mobile interface (other than using "XX Wikipedia" which isn't a real reference). I know this isn't ideal and we are interested in hearing the community's feedback on this issue." This is really the most important issue with wikidata imho. Unless easy referencing can be fixed, wikidata quality will only become worse. See my practical comments at Wikidata:Referencing improvements input and Wikidata talk:Primary sources tool. In terms of interface, i highly recommend to have a look at the tool CitationHunt, which was built and presented a week ago at en:Wikipedia:Village_pump_(idea_lab)#CitationHunt, very concise and with topic selection for users by category.

--Atlasowa (talk) 09:42, 19 March 2015 (UTC)

  • I took a look at CitationHunt. It looks like a great tool for desktop users, but not really that useful on mobile. The main challenges with adding references on mobile are that you usually need to have multiple tabs or windows open at once and creating references in the correct syntax typically requires presenting the user with a large complicated form to fill out. One of the ideas that we discussed for a future version of WikiGrok is an interface to browse the existing references in an article and choose one that applies. The main concerns with this type of feature were that (1) the added complexity to the workflow would deter people from using it, and (2) that people would just choose a random reference without checking that it actually supported the claim. We discussed this issue with the Wikidata development team as well and came to the conclusion that filling in missing claims with unreferenced information was better than having no information at all. I'm not 100% satisfied with that conclusion, but it seems to reflect the current practices on Wikidata (given that 90% of current claims have no references). BTW, I asked our analytics person to post more detailed analysis of our existing results, and she says she's going to do that soon. I also asked the PM about publicly posing our existing data (the ~36,000 responses). She said that was fine as long as the data is anonymized, so I'll post that data shortly. Ryan Kaldari (WMF) (talk) 23:24, 19 March 2015 (UTC)
  • Oppose approval due to absence of references. I wouldn't let my auto mechanic use a pliers when a wrench is required. References are necessary. Any tool that cannot include references is the wrong tool for the job. Jc3s5h (talk) 17:54, 24 March 2015 (UTC)
    • @Jc3s5h: Of all the bot requests this year that were for inserting claims, not a single one was even requested to use references before being approved. Why is this request being treated differently? There are no policies or guidelines stating that claims must have references (or that bots must use references) and the vast majority of current claim insertions (by both users and bots) do not include references. The purpose of this feature, like the Wikidata game, is to help fill in "common knowledge" on Wikidata. Unlike Wikidata game, however, we are actually making sure that the knowledge is "common" by requiring many people to agree on it first. Ryan Kaldari (WMF) (talk) 18:55, 24 March 2015 (UTC)
      That oppose seems to be weird giving the scope of this request and current practice.
      We need to be able to follow the flow of information. Please make some sort of edit summary that links to an information page about Wikigrok, link to the article people where were on when they were asked the question and include a link to the (two?) users who approved the claim. We'll all be SUL soon so you can probably just link to local usernames. Multichill (talk) 20:46, 24 March 2015 (UTC)
      • @Multichill: I've created a task for linking to an information page from the edit summaries.[6] Linking to the users would be difficult since we're probably going to require at least 5 people to agree on any claim before submitting it. Even worse, most of those users are going to be IPv6 IP addresses. We have a plan to create a CheckUser style interface that will let you look up all the users that contributed to a specific WikiGrok revision, but we don't want to build that interface until we are sure that we want to actually make WikiGrok into a real feature (instead of just an experiment). This bot request is just for submitting the data from our previous test. It's a relatively small amount of data and quite unlikely to have been a vandalism target. (It's probably the most inefficient way you could possibly vandalize a Wikimedia project since you have to submit numerous false claims under different users.) Ryan Kaldari (WMF) (talk) 22:48, 25 March 2015 (UTC)
      If the user interface of wikigrok makes it clear that only common-knowledge information should be added, fine. Some bots insert data from reasonably reliable sources, so the indication that is also added that the information was imported from the reliable source is sufficient. Most of the bot edits I've looked at were for birth and death dates, and those were a disaster. The bots didn't know the difference between the Julian and Gregorian calendar. It was often difficult to investigate what the real birth date was because no citation was provided. Jc3s5h (talk) 12:52, 25 March 2015 (UTC)
User interface of WikiGrok (version B): tagging.

Just look at the user interface. The reader is at a Wikipedia article, WikiGrok pops up and asks "which of these tags describe <article person>?" The Wikipedia article is right there for him to read, above the task, and the proposed tags are actually harvested by Wikigrok from this same Wikipedia article. It is a suggestive question. No wonder that there is hardly any disagreement at all with "ground truth": meta:Research:WikiGrok/Test3#Number of responses vs. Percentage Agreement. And the reader is not asked to look for sources (or even look at Wikipedia refs) before he tags Paul Rand as a graphic designer. Think about it. I would really like to know: What would happen if WikiGrok would propose to readers to tag Sergej Rachmaninov as: homosexual, atheist, occupation cannibalism? Maybe some disagreements? What if you present the user a doctored Wikipedia article above this WikiGrok task, which reads "Sergej Rachmaninov was a famous russian cannibalist and a convicted homosexual and atheist." (no refs, no sources) How many will then tag Rachmaninov as: homosexual, atheist, occupation cannibalism? 80%? or less? *It must be true, it's on Wikipedia* I really wonder. --Atlasowa (talk) 15:22, 25 March 2015 (UTC)

@Ryan Kaldari (WMF), "There are no policies or guidelines stating that claims must have references" - Come on, i already quoted this to you on this very page,

  • Help:Sources: "The majority of statements on Wikidata should be verifiable insofar as they are supported by referenceable sources of information such as a book, scientific publication, or newspaper article." See also
  • Help:Statements#Add only verifiable information: "Wikidata is not a database that stores facts about the world, but a secondary knowledge base that collects and links to references to such knowledge. This means that Wikidata does not state what the population of Germany actually is; it simply provides the information on what the population of Germany is according to a specific source, such as the The World Factbook (Q11191) CIA World Factbook. As such, most statements should be verifiable by referenceable sources of information like a book, scientific publication, or newspaper article. In Wikidata, references are used to point to specific sources that back up the data provided in a statement."
  • Re: "the Wikidata community as a whole seems to be OK with it. I've added over 1000 claims myself with my Wikidata community account, and so far none of them have been reverted for lacking references." Did it ever occur to you, that maybe there are not masses of wikidata users checking up on your edits and deciding not to revert you? But rather that almost nobody watches wikidata items? Not even the most obvious vandalism by the most vandalism-prone IP editors is patrolled at recent changes [7][8][9]. Don't even ask about patrolling widar edits. Or the bots that do 90% of wikidata editing. And did you notice that it is not easy to add references to wikidata, because there are no tools that make this easier for normal human beings. "The main challenges with adding references on mobile are that you usually need to have multiple tabs or windows open at once and creating references in the correct syntax typically requires presenting the user with a large complicated form to fill out." That would have really helped normal, non-mobile, desktop, wikitext editors, which are the majority of productive editors. --Atlasowa (talk) 16:34, 25 March 2015 (UTC)
    • @Atlasowa: I'm aware that claims should be verifiable with references. I used to be an admin on Wikidata back in 2012 and helped write the guidelines. There is no requirement currently that claims must be referenced. Your suggestion that such claims should be reverted is not based in policy. If it were, I'm sure someone would have written a bot to delete all unreferenced claims from Wikidata (the vast majority of them), and tools like Widar, Autolist, and Wikidata Game would have been banned a long time ago. I agree that Wikidata needs better interfaces for adding references, but that is outside the scope of this bot permission request. This request is for inserting less than 1000 unreferenced claims (100 to start with), restricted to 5 specific types, which have achieved consensus from Wikipedia readers. It's scope is smaller than a lot of single Autolist actions (which don't require bot approval). Nothing in this bot request is related to cannibalism, homosexuality, or atheism. Yes, the accuracy of the claims will be influenced by the accuracy of Wikipedia, but that's why we are choosing relatively uncontroversial and easy to verify statements. The same can't be said for tools like Autolist which make thousands of edits based on nothing but Wikipedia category inclusion (which can often be quite subjective). Let's discuss the actual cases involved in this bot request rather than hypothetical cases. Ryan Kaldari (WMF) (talk) 18:01, 25 March 2015 (UTC)

@Multichill, Pasleim, Ricordisamoa, Ymblanter: I would like to ask for 2nd opinions on this bot request. So far the only concrete reason that has been raised for objecting to this request is that the edits will not be referenced, even though bots are not normally required to add references. Is that a valid reason to block the request? Are there other concerns that need to be addressed? Keep in mind that this bot request is to make less than 1000 edits total (the first 100 of which are shown here). Ryan Kaldari (WMF) (talk) 23:09, 30 March 2015 (UTC)

Having references is best practice but not a requirement, and I think the bot should be approved.--Ymblanter (talk) 09:59, 31 March 2015 (UTC)
@Ryan Kaldari (WMF): While I'm eager to support verification of sources by multiple users, I cannot abstain from raising some concerns about this particular task: it is certainly possible to have a bot import fake data by adding them to Wikipedia, but even imported from (P143) can help tracking data down to their original sources; as far as I understand from the above description of the task, the bot would allow malicious manipulation of our knowledge base in an almost anonymous way; indeed references are not required, but approving a bot that does not even let users know where its statements come from would constitute a dangerous precedent. For these reasons, I'd rather have an additional level of verification which would require a (auto)confirmed user to ultimately approve every statement.
@Ymblanter: Please do not approve controversial tasks unless all concerns have been dispelled.
--Ricordisamoa 10:26, 31 March 2015 (UTC)
@Ricordisamoa: Thanks for your feedback. Unfortunately, I don't think requiring an additional level of verification from (auto)confirmed users would work very well in practice. So far I haven't even been able to get people to verify the 100 claims at Wikidata:Requests for permissions/Bot/WikiGrok/First edits, despite asking on the Project Chat and a mailing list. I had to verify most of them myself and it still isn't finished. If ultimately we only trust (auto)confirmed users to vouch for these claims, we should just ask those users the questions to start with and not bother asking readers. The downside to that approach, however, is that we would lower the volume of participation and incoming data by an order of magnitude (or probably several orders of magnitude). We do have several ideas for addressing concerns with data quality and manipulation, such as:
  1. Creating an interface to view all the users who contributed to a particular claim being added. We would then include a link from each WikiGrok edit summary to the corresponding entry in this interface.
  2. Giving different users different weights. Autoconfirmed users could have a higher weight than other users. For example, it might take only 2 autoconfirmed users to verify a claim, but 10 non-autoconfirmed users.
  3. Testing users with claims we know are wrong. This would be quite tricky to implement (since it's always hard to know what data is accurate and what isn't), but theoretically we could present users with bogus data, and if they agree with it, they would no longer be able to use WikiGrok.
What are your thoughts on those ideas?
I would like to clarify, though, that this bot request is just for pushing up the existing data from our test run (less than 1000 edits). We want to push this data in order to get a better understanding of the quality of the data. That information, in turn, will help us decide how many layers of quality control make sense to put on the feature. The evaluation that we've done so far (see meta:Research:WikiGrok) leads us to believe that aggregation of responses is a powerful and effective mechanism for ensuring accuracy (at least for non-controversial questions). We now want to put some of that data into the wild and get the Wikidata community to look at it and see what the feedback is (and whether any of the claims are reverted). We will not be moving ahead with high-volume WikiGrok submissions without conducting further community consultation, adding additional safeguards, and creating a new bot permission request. Kaldari (talk) 19:39, 31 March 2015 (UTC)

CaliburnBOT[edit]

CaliburnBOT (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Caliburn (talkcontribslogs)

Task/s: Adding Commons category (P373) to pages in this category and this category. (and other similar categories on non-English Wikis on request)

Code: User:Caliburn/code.py (the actual python file is stored locally on my computer, so this may not represent the final version, or the version that is being used) and User:Caliburn/code.py/fi

Function details: Basically, it loads a page from the category specified in the short description, uses mwparserfromhell to get the parameter, then adds the parameter. You can see some working diffs: here and here. I did run into a few problems, but those have been resolved and have been retested. Note:I would do this on my Caliburn account, but it's currently being renamed. I will post a confirmation signature when I can log in. CaliburnBOT (talk) 12:21, 20 February 2015 (UTC)

Confirmation: --George (Talk · Contribs · CentralAuth · Log) 12:52, 20 February 2015 (UTC)

  • Time2wait.svg On hold - making large improvements and modifications. This might take some time. --George (Talk · Contribs · CentralAuth · Log) 12:56, 22 February 2015 (UTC)

@Caliburn: your script is wrong, because if template.name == "Commons category" or "Commons cat" or ... will always evaluate as True. You should probably write something like: if template.name.strip() in ("Commons category", "Commons cat", ...). --Ricordisamoa 10:30, 30 March 2015 (UTC)

@Ricordisamoa: Thanks! :D The code is probably badly written, as this is my first attempt at a bot. I will make the change. Otherwise, how does it look? --George (Talk · Contribs · CentralAuth · Log) 15:02, 30 March 2015 (UTC)

Revibot 3[edit]

Revibot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: -revi (talkcontribslogs)

Task/s: Fix double redirect

Code: mw:Manual:Pywikibot/redirect.py

Function details: It's simple: bot will retrieve the list of double redirects and try to fix it, unless it is circular redirect. (I am running initial test run now.) — Revi 12:16, 7 December 2014 (UTC)

Note: Moe Epsilon's userpages are circular redirect, which means bot cannot solve it . — Revi 12:23, 7 December 2014 (UTC)
Time2wait.svg On hold phab:T77971 blocking this task. — Revi 15:19, 9 December 2014 (UTC)
@-revi: I'm doing this taks since a couple of weeks. If you want to take it over, I've published the code here --Pasleim (talk) 21:33, 30 March 2015 (UTC)

Shyde[edit]

Shyde (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Gallaecio (talkcontribslogs)

Task/s:

  • Update of the latest stable version of free software. My initial plan is to support videogames from the Chakra repositories that exist in Wikidata already. For games that do not exist in Wikidata, I may create an entry for them if a Wikipedia article exists about them. Later, I plan to extend the software list to other types of Chakra software that is also present in Wikidata or the English Wikipedia.

Code:

Function details:

  • For each piece of software that the script supports (current list):
    • Add the latest stable version to the version property of a software entity if such version is not present.
    • Add a release date qualifier to the latest stable version property value of a software entity if such version lacks a release date qualifier.
    • Add an URL reference to the latest stable version property value of a software entity if such version lacks a reference.

Motivation:

  • I was writing a script to detect which games (and possibly regular applications in the future) in Chakra are out of date in the Chakra repositories. I realized that such information, if pulished in Wikidata, could benefit a wide audience. Since I got some skills with Pywikibot, I though that making a new script that updates this information in Wikidata would be fun, and it would help me to get to know Wikidata better.

--Gallaecio (talk) 15:09, 29 November 2014 (UTC)

It would be good if you could set the rank of the latest version to "preferred" and all other ranks to "normal". --Pasleim (talk) 18:17, 6 December 2014 (UTC)
Any progress here?--Ymblanter (talk) 16:40, 11 February 2015 (UTC)
@Gallaecio:--GZWDer (talk) 04:19, 12 February 2015 (UTC)

JhealdBot[edit]

JhealdBot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Jheald (talkcontribslogs)

Task/s: To add about 35,000 new topic's main category (P910) / category's main topic (P301) property pairs based on matches on Commons category (P373)

Code: not yet written

Function details: About 35,000 potential new topic's main category (P910) / category's main topic (P301) pairs have been identified, based on a unique Commons category (P373) existing for each cat-like item and each non-catlike item, where the article-like item does not have any topic's main category (P910) property currently set.

A preliminary sample, for Commons cats starting with the letter 'D', can be found at User:Jheald/sandbox.

Still to do, before starting editing, would be to remove "List of" articles, as these should not be the category's main topic (P301) of a category; and also to check the cats for any existing category's main topic (P301) and category combines topics (P971) properties set. -- Jheald (talk) 23:30, 7 September 2014 (UTC)

Would you please make several dozen trial contributions?--Ymblanter (talk) 15:54, 24 September 2014 (UTC)
@Jheald: is this request still current? Or can it be closed? Multichill (talk) 17:13, 22 November 2014 (UTC)
@Multichill: It's not near the top of my list. I probably will get back to it eventually, but I don't see topic's main category (P910) / category's main topic (P301) pairs as so important, if we will have items on Commons for Commons categories. And I'd probably use QuickStatements, at least for the test phase. So the request can be put into hibernation for the moment. Jheald (talk) 22:39, 22 November 2014 (UTC)

BthBasketbot[edit]

BthBasketbot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Bthfan (talkcontribslogs)

Task/s: Import basketball players team history from Template:Infobox basketball biography from English Wikipedia

Code: User:Bthfan/bot_code.py

Function details: This bot is using pywikibot and is based on the harvest_template.py script (see https://git.wikimedia.org/blob/pywikibot%2Fcore.git/HEAD/scripts%2Fharvest_template.py). I modified the original code a lot to import the team history of a basketball player from Template:Infobox basketball biography on the English wikipedia. That template has years1, team1, years2, team2, ... to specify the single teams a player has played for in which years. This bot will combine a years* property with one team* property to create the following Wikidata claim:
member of sports team (P54) miga: team name
with qualifiers: start time (P580) miga: (start year gets extracted from years*, bot looks for four digit number); end time (P582) miga: (end year gets extracted from years*, bot looks for four digit number; if there is only one number it assumes start year=end year)

The bot re-uses existing member of sports team (P54) miga entries if there are no qualifiers attached to that claim yet. This reuse is needed as some basketball players already have a few member of sports team (P54) miga claims, as other bots imported for example categories like https://en.wikipedia.org/wiki/Category:Olimpia_Milano_players (every player in such a category got a member of sports team (P54) miga entry). But those entries have no start and no end date and such lack some information that's included in the infobox.

The bot code is not completely finished yet, it needs the patch from https://gerrit.wikimedia.org/r/#/c/125575/ to use the editEntity and JSON feature as far as I see this. The problem is that some players have played for two different teams in one year. Then the template entry in Wikipedia looks like this:
years1=2012
team1=Team A
years2=2012
team2=Team B

So I need to possibly reorder existing member of sports team (P54) miga claims so that the order of items is correct and corresponds to the order in the template/infobox. This is currently not possible yet with the existing pywikibot code (I would need to access the API function wbsetclaim as this one allows one to set the index of a claim). --Bthfan (talk) 08:17, 10 June 2014 (UTC)

BTW: Basically I'm waiting for the patch at https://gerrit.wikimedia.org/r/#/c/125575/ to be finished, then I could edit the whole entity and with that way reorder existing claims. --Bthfan (talk) 22:08, 28 June 2014 (UTC)
Still blocked by https://gerrit.wikimedia.org/r/#/c/125575/, that patch is buggy in some way. I guess it will take a while until the bot code is finished so that it can do some test run :/ --Bthfan (talk) 07:01, 9 July 2014 (UTC)

@Bthfan: gerrit:125575 can manage qualifiers and sorting as well. Even if it's not near to being merged, you can test it locally (SamoaBot is running on Tool Labs with that change, just now). However, your code does not appear to take full advantage of the new system. See an example of the correct usage:

claim = pywikibot.Claim(site, pid)
claim.setTarget(value)
qual = pywikibot.Claim(site, pid, isQualifier=True)
qual.setTarget(value)
if qual.getID() not in claim.qualifiers:
    claim.qualifiers[qual.getID()] = []
claim.qualifiers[qual.getID()].append(qual)

--Ricordisamoa 19:37, 23 July 2014 (UTC)

Ok, thanks for the example. I did test that patch, the patch broke the addQualifier function in pywikibot (I left a comment on gerrit). So your example code is the new way to add a qualifier, one should no longer use addQualifier? --Bthfan (talk) 20:17, 23 July 2014 (UTC)
That is the only fully working way. I plan to support addQualifier (in the same or in another patch), but editEntity() would have to be called to apply the changes. --Ricordisamoa 18:09, 24 July 2014 (UTC)
Ah, I see how it should work then :). Ok, I'll try that. --Bthfan (talk) 11:56, 25 July 2014 (UTC)
Just an update on this: This bot is currently blocked by a bug in the Wikidata API (at least it looks like a bug to me), see https://bugzilla.wikimedia.org/show_bug.cgi?id=68729 wbeditentity ignores changed claim order when using that API function. --Bthfan (talk) 08:19, 6 August 2014 (UTC)
@Bene*, Vogone, Legoktm, Ymblanter: Any 'crat to comment?--GZWDer (talk) 10:52, 11 August 2014 (UTC)
My understanding is that we are not yet ready for approval and waiting for a bug to be resolved.--Ymblanter (talk) 13:56, 11 August 2014 (UTC)
That's correct. I could try to use another API function for reordering the claims (there is one: wbsetclaim with the index parameter), but this means to modify pywikibot quite a bit as this API function is not used/implemented yet in pywikibot. I currently don't have enough time for that :) --Bthfan (talk) 14:25, 11 August 2014 (UTC)

Fatemibot[edit]

Fatemibot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Fatemi127 (talkcontribslogs)

Task/s: Updating sitelink and items that moved categories, articles &... in all namespace in Persian wikipedia (fawiki)

Code: Pywiki (This code)

Function details: Moving category needs to updating it item in WD. My bot can move categories and need permission bot in wd for this work. --H.Fatemi 08:59, 25 April 2014 (UTC)

BA candidate.svg Precautionary oppose since the code is not very well-written and seems to use hard-coded configuration specific to another bot. I could change opinion when the code is cleaned up. --Ricordisamoa 14:56, 25 April 2014 (UTC)
@Ricordisamoa: Hi, this code running very well. Plz see this history for وپ:دار. User:Rezabot and User:MahdiBot Working with this code. :) also my bot has flag in Persian wiki (fawiki) and in this month (reference) Fatemibot has more 15000 edits that means I am able to do this properly.Please Give me a chance for a test case to prove to you, thanks. H.Fatemi 15:30, 25 April 2014 (UTC)
Nothing is preventing you from making a short test run (50-250 edits) :-) --Ricordisamoa 15:36, 25 April 2014 (UTC)
thanks H.Fatemi 21:15, 25 April 2014 (UTC)
BTW, very simple changes like these should really be made in a single edit. --Ricordisamoa 21:01, 25 April 2014 (UTC)
@Ricordisamoa: See this Special:Contributions/10.68.16.37 :-| from my wrong typing these edits with IP Occurred but I edited my user name in this code and for test i submitted a page for job to labs. plz wait to other users request the changing with my bot in w:fa:وپ:دار it is now is empty! I come back soon. thanks a lot and forgive me for speaking weakness. H.Fatemi 21:15, 25 April 2014 (UTC)
@Ricordisamoa: ✓ Done about 110 edits :) H.Fatemi 06:42, 26 April 2014 (UTC)
@Fatemi127: my first comment still applies, so you'd have to wait for a bureaucrat. --Ricordisamoa 20:32, 8 May 2014 (UTC)
@Bene*, Vogone, Legoktm, Ymblanter: Any 'crat to comment?--GZWDer (talk) 10:52, 11 August 2014 (UTC)
@Fatemi127, Bene*, Vogone, Legoktm, Ymblanter: Is it ready to be approved?--GZWDer (talk) 12:31, 19 September 2014 (UTC)
We clearly have a problem here, and I do not see how it was resolved.--Ymblanter (talk) 12:38, 19 September 2014 (UTC)

ValterVBot 12[edit]

ValterVBot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: ValterVB (talkcontribslogs)

Task/s: Delete all population (P1082) that I have added in Italian municipality.

Code:

Function details: After this discussion and this, is necessary delete population (P1082) in all Italian municipality, because the source is Istituto Nazionale di Statistica (Q214195) and they have a license "CC-BY" (Legal notes). With this license is impossible to use data outside of Wikidata, because we have "CC0". --ValterVB (talk) 19:12, 11 April 2014 (UTC)

I don't think such data can be copyrightable :/ --Ricordisamoa 19:41, 11 April 2014 (UTC)
@ValterVB: If it's a really copyvio, should we revdel related version?--GZWDer (talk) 07:24, 12 April 2014 (UTC)
I think that technically isn't copyviol because I gave appropriate credit inside Wikidata. But if someone use this data outside of Wikidata without credit there is the problem. So probably it's sufficient delete the data. --ValterVB (talk) 08:29, 12 April 2014 (UTC)
I don't see any indication that this is a real problem. The CC-BY 3.0 license of ISTAT doesn't waive database rights (only CC-0 does; and for our purposes CC-BY 4.0 as well), but the data is being used nonetheless on Wikipedia. Wikimedia projects are already ignoring ISTAT's database rights and m:Wikilegal/Database Rights doesn't say it's a problem to store (uncopyrightable) data which was originally extracted against database rights. --Nemo 08:48, 14 April 2014 (UTC)
Maybe ask WMF Legal to weight in, or ask ISTAT with the Email template to be developed if they are OK with our usage and republication. If either takes an unreasonable time to go through, or comes back negatively, then I agree with the proposal. Content that has been imported directly from a database with an incompatible license should be removed. Given that the template gets developed, does anyone have connections to ISTAT to ask? --Denny (talk) 20:18, 14 April 2014 (UTC)
The license of the database is not incompatible, the data itself is PD-ineligible in Italy (a. not innovative, b. official document of the state). The problem, as usual, are database rights, but see above.[10] Someone from WMIT will probably ask them to adopt a clearer license in that regard but I wouldn't worry too much. --Nemo 15:54, 15 April 2014 (UTC)
@Nemo, Sorry but I don't understand you when you say there is no problem: CC-0 is not CC-BY. 1) Wikipedia respects the licence about CC-BY and 2) the authors select a licence for their work so the database rights are not more applicable. WD doesn't respect the CC-BY so that's the main problem. Database rights are a problem for databases which are not free: you can always use their data according to the short citation right but the problem becomes difficult when a lot of short citations are made in the same document. Snipre (talk) 15:10, 15 May 2014 (UTC)
This is like the PD-art discussion. IANAL, but as far as I know the US doesn't have database rights, see also en:Sui generis database right. So from a Wikidata as a site shouldn't have any problems. If User:ValterVB is in a country that does have these laws (Italy), he might be liable as a person. Sucks man, but we're not going to delete it because of that. Be more careful in the future. This request already open for quite some time. Should probably be closed as denied. Multichill (talk) 17:00, 22 November 2014 (UTC)

Structor[edit]

Structor (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Infovarius (talkcontribslogs)

Task/s: Setting claims, labels and description. Some small (<1000) portions of edits which will be tedious to do by hand. Particular task: To provide structural information for species items. genera under consideration.

Code: Uses API functions through URLFetch function of Wolfram Mathematica which is not open-source. Mathematica code of main edit function, Mathematica code of the task.

Function details: Just an example. The bot gets a list of potential items like for genus Eutreptia. I view the list and extract working list (here I remove the genus itself). Then the bot goes and do such edits. The item for genus I choose or create by myself if necessary. --Infovarius (talk) 15:50, 9 April 2014 (UTC)

Making strange, unsourced edits to universe (Q1) like these? Its completly unclear what your strange named bot will do. A common name for your first bot would be InfovariusBot. Regards --Succu (talk) 18:52, 9 April 2014 (UTC)
Q1 was first-try error. As its name assumes Structor will make a structures by different properties. I don't know if it needs a bot-flag for small tasks. I hoped that the flag helps me to do quicker edits through API, but may be it's not the case. --Infovarius (talk) 11:19, 10 April 2014 (UTC)
I don't want to rename the bot as it has SUL with ~0.5 mln edits. --Infovarius (talk) 11:21, 10 April 2014 (UTC)
@Infovarius: Your bot added a lot of duplicate P31 and P171 claims. Why?--GZWDer (talk) 12:57, 10 April 2014 (UTC)
A lot? I know about 1 duplicate P31 which I've corrected already. --Infovarius (talk) 16:49, 10 April 2014 (UTC)
@Infovarius: And, Please do not use confused edit summary [11]--GZWDer (talk) 04:56, 11 April 2014 (UTC)
I want to do it as precisely as I can. What variant do you propose? Infovarius (talk) 05:00, 11 April 2014 (UTC)
@GZWDer:, please tell me what summary should be? --Infovarius (talk) 15:54, 25 April 2014 (UTC)
@Infovarius: If you use wbeditentity, you can set summary as "import Pxx/Pxx/Pxx"; You can use no summary if you use wbcreateclaim.--GZWDer (talk) 09:00, 26 April 2014 (UTC)
Don't forgt to add stated in (P248) as reference.--GZWDer (talk) 09:02, 26 April 2014 (UTC)
Hm, there's a problem. I am deriving genus from the species latin names. What should I note as source? Infovarius (talk) 12:03, 26 April 2014 (UTC)
@Infovarius: You can use no source if the claim is obvious.--GZWDer (talk) 12:12, 26 April 2014 (UTC)
AFAIU, only the Mathematica engine is non-free, while you can redistribute programs based on it. --Ricordisamoa 00:10, 12 April 2014 (UTC)
@Infovarius: do you haz teh codez? :P --Ricordisamoa 23:26, 23 April 2014 (UTC)
@Ricordisamoa, Succu: I've updated the request with the codes. Infovarius (talk) 11:46, 1 July 2014 (UTC)

@Infovarius:: For your tests you should use our testrepo test.wikidata.org, not wikidata (see history of Hydrangea candida). --Succu (talk) 12:09, 16 April 2014 (UTC)

There is no reaction and an obvious a lack of experience. So I decided to Symbol oppose vote.svg oppose. --Succu (talk) 21:57, 21 April 2014 (UTC)
@Succu: You should @Infovarius: to get more message.--GZWDer (talk) 10:24, 23 April 2014 (UTC)
@Infovarius: Do more test edits please!--GZWDer (talk) 10:25, 23 April 2014 (UTC)
I've learned API:wbeditentity, so I can do now such edits. Infovarius (talk) 21:26, 23 April 2014 (UTC)
@Succu: Structor is being tested at testwikidata:Special:Contributions/Structor.--GZWDer (talk) 14:53, 25 April 2014 (UTC)
@GZWDer:: I know. I see some test edits. But I don't see a reasonable summary, which you demanded. @Infovarius: You should choose a clear and limited area of operation for your first but run. It would be nice if you could define it and run some further test edits. --Succu (talk) 15:17, 25 April 2014 (UTC)
I think that my task will be: "To provide structural information for species items." There are so many of empty (without properties) species which I am running into. --Infovarius (talk) 12:03, 26 April 2014 (UTC)
It's not true that most taxa have no properties. Two questions:
  1. There are some hundred genra with the same name. How do you identify the correct one for parent taxon (P171)?
  2. Based on which assumptions you will use taxon (Q16521) / monotypic taxon (Q310890) for instance of (P31)?
--Succu (talk) 06:45, 30 April 2014 (UTC)
1. I am trying to skip homonymous genera at first. 2. I use taxon (Q16521) always. While for species monotypic taxon (Q310890) is always OK too, I suppose, superset is also correct. --Infovarius (talk) 21:24, 3 May 2014 (UTC)
  1. To „skip homonymous genera” you have to be aware of them, so I have to repeat my question: How do you identify them?
  2. You suppose? Species are never monotypic.
Dear Infovarius, mind to inform WikiProject Taxonomy about your bot plans? --Succu (talk) 21:47, 3 May 2014 (UTC)
Thank you for recommendation, ✓ Done. --Infovarius (talk) 11:46, 1 July 2014 (UTC)
I added some remarks over there. --Succu (talk) 17:05, 2 July 2014 (UTC)
@Bene*, Vogone, Legoktm, Ymblanter, The Anonymouse: Any 'crat to comment?--GZWDer (talk) 04:56, 30 April 2014 (UTC)


@Infovarius: Could you explain this change, please. Thx. --Succu (talk) 19:35, 14 July 2014 (UTC)

I think this is a page logging all actions by bot. @Bene*, Vogone, Legoktm, Ymblanter: Any 'crat to comment?--GZWDer (talk) 10:51, 11 August 2014 (UTC)
  • Let us return here. What is the current situation?--Ymblanter (talk) 07:46, 16 September 2014 (UTC)
The discussions stopped here. --Succu (talk) 07:59, 16 September 2014 (UTC)
  • Time is changing. Now the task about genus-species linking almost expired because Succu has done most of it. I can still perform the task of labelling species by its scientific name. I support en, de, fr, ru labels and descriptions but can gather information about as many languages as possible. But I won't do it without approving of flag as I'm afraid it'll be in vain because of vast burocracy... Infovarius (talk) 19:01, 14 October 2014 (UTC)
There are around 300.000 items left. :) --Succu (talk) 20:47, 14 October 2014 (UTC)
Why do not the two of you agree on some well-defined task, and then I could flag the bot.--Ymblanter (talk) 06:44, 15 October 2014 (UTC)

Global Economic Map Bot[edit]

Global Economic Map Bot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Alex and Amir

Task/s: The Global Economic Map Bot will be the primary bot to update the Global Economic Map project. It will retrieve data from a variety of economic databases.

Code: Python

Function details: The Global Economic Map Bot will be the primary bot to update the Global Economic Map project. It will retrieve data from World Bank Indicators, UN Statistics, International Labor Organization, Bureau of Economic Analysis, Gapminder World, OpenCorporates and OpenSpending. The data retrieved will automatically update Wikidata with economic statistics and it will also update the Global Economic Map project. --Mcnabber091 (talk) 21:42, 26 January 2014 (UTC)

I'm helping for the harvesting and adding these data Amir (talk) 21:47, 26 January 2014 (UTC)
@Mcnabber091, Ladsgroup: Is this request still needed? Vogone talk 13:18, 30 March 2014 (UTC)
yes Amir (talk) 13:39, 30 March 2014 (UTC)
Could you create the bot account and run some test edits? The Anonymouse [talk] 17:09, 7 May 2014 (UTC)
@Ladsgroup:--GZWDer (talk) 05:06, 11 June 2014 (UTC)
Can you please, give us several months in order to get the datatype implemented? Amir (talk) 14:02, 17 June 2014 (UTC)

KunMilanoRobot[edit]

KunMilanoRobot (talkcontribsSULBlock logUser rights logUser rights management)
Operator: Kvardek du (talkcontribslogs)

Task/s:

  • Add french 'intercommunalités' on french communes items (example)
  • Add french communes population
  • Correct Insee codes of french communes

Code:

Function details: Takes the name of the 'communauté de communes' in the Insee base and adds it if necessary to the item, with point in time and source. Uses pywikipedia. --Kvardek du (talk) 19:27, 21 January 2014 (UTC)

Imo the point in time qualifier isn't valid here as the propriety isn't time specific. -- Bene* talk 15:10, 22 January 2014 (UTC)
Property:P585 says "time and date something took place, existed or a statement was true", and we only know the data was true at January 1st, due to numerous changes in French organization. Kvardek du (talk) 12:18, 24 January 2014 (UTC)
Interesting, some comments:
  • Not sure that "intercommunalités" are really aministrative divisions (they are built from the bottom rather than from the top). part of (P361) might be more appropriate than located in the administrative territorial entity (P131)
  • Populations are clearly needed but I think we should try do it well from the start and that is not easy. That seems to require a separate discussion.
  • INSEE code correction seems to be fine.
  • Ideally, the date qualifiers to be used for intercommunalité membership would be start time (P580) and end time (P582) but I can't find any usable file providing this for the whole country. --Zolo (talk) 06:37, 2 February 2014 (UTC)
Kvardek du : can you add « canton » and « pays » too ? (canton is a bit complicated since some cantons contains only fraction of communes)
Cdlt, VIGNERON (talk) 14:01, 4 February 2014 (UTC)
Wikipedia is not very precise about administrative divisions (w:fr:Administration territoriale). Where are the limits between part of (P361), located on terrain feature (P706) and located in the administrative territorial entity (P131) ?
Where is the appropriate place for a discussion about population ?
VIGNERON : I corrected Insee codes, except for the islands : the same problem exists on around 50 articles due to confusion between articles and communes on some Wikipedias (I think).
Kvardek du (talk) 22:26, 7 February 2014 (UTC)
@Bene*, Vogone, Legoktm, Ymblanter, The Anonymouse: Any 'crat to comment?--GZWDer (talk) 14:37, 25 February 2014 (UTC)
I'm still not familiar with the "point in time" qualifier. What about "start date" because you mentioned the system has changed to the beginning of this year? Otherwise it might be understood as "this is only true/happened on" some date. -- Bene* talk 21:04, 25 February 2014 (UTC)
Property retrieved (P813) is for the date the information was accessed and is used as part of a source reference. point in time (P585) is for something that happened at one instance. It is not appropriate for these entities which endure over a period of time. Use start time (P580) and end time (P582) if you know the start and end dates. Filceolaire (talk) 21:19, 25 March 2014 (UTC)

Symbol support vote.svg Support if the bot user uses start time (P580) and end time (P582) instead of point in time (P585) --Pasleim (talk) 16:48, 28 September 2014 (UTC)