User talk:Andrawaag

From Wikidata
Jump to: navigation, search
Logo of Wikidata

Welcome to Wikidata, Andrawaag!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

  • Introduction – An introduction to the project.
  • Wikidata tours – Interactive tutorials to show you how Wikidata works.
  • Community portal – The portal for community members.
  • User options – including the 'Babel' extension, to set your language preferences.
  • Contents – The main help page for editing and using the site.
  • Project chat – Discussions about the project.
  • Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, please ask me on my talk page. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards! --Tobias1984 (talk) 18:10, 1 August 2014 (UTC)

ProteinBoxBot[edit]

I think your bot is editing logged out, I'm going to block the IP until your bot is logged back in. See here. --AmaryllisGardener talk 14:03, 11 October 2014 (UTC)

~ 1000 duplicates[edit]

Hello, looks like your bot created ~1000 duplicate items yesterday. Please see Wikidata:Database reports/Constraint violations/P351#"Unique value" violations. — Ivan A. Krestinin (talk) 17:34, 16 October 2014 (UTC)

@Ivan A. Krestinin: Thanks for noting. I will work towards a fix Andrawaag (talk) 19:56, 16 October 2014 (UTC)
Hey Andrawaag. You could use WikidataQuery to check if an item with a certain claim already exists. For example does [1] return all items with the claim Entrez Gene ID (P351)=1017. However, this will probably slow down your bot significantly. Another option is to get with [2] all items and P351 values within one call. --Pasleim (talk) 07:09, 18 October 2014 (UTC)

OMIM ID (P492)[edit]

Hi, your bot imported some values yesterday that created some violations; I tried a few of them with just the number (not the MTHU part), but they all still seem invalid, so I'm wondering if maybe you used the wrong property for this import? Jon Harald Søby (talk) 15:35, 12 February 2015 (UTC)

User_talk:ProteinBoxBot#Given_names[edit]

Please see the comment above. --- Jura 09:47, 9 May 2015 (UTC)

@Jura: Thanks for noticing. The duplicates were unfortunately created in a bot effort on may 5th. I did respond to the comments by requesting a Bulk deletion request, unfortunately that didn't went well. Since I have been travelling I haven't been able to respond earlier. I am working towards a fixAndrawaag (talk) 22:48, 10 May 2015 (UTC).

p2175 and p2176[edit]

medical condition treated (P2175) and drug used for treatment (P2176) are ready. Some discussion about labels and descriptions still needed. --Tobias1984 (talk) 12:29, 5 October 2015 (UTC)

ProteinBoxBot Mistake?[edit]

I think that this edit by ProteinBoxBot is a mistake because the formerly related article en:Huntingtin perfectly fits in the discrption of the label. And the UniProt-ID is also the same (P42858).--Sonabi (talk) 17:55, 8 October 2015 (UTC)

The bot is removing wikidata entries in many items just to add the entry on another item, whereby the first item describes the protein and the second the related gene, like here and for the gene describing item here. What is the sense of that? --Sonabi (talk) 22:58, 8 October 2015 (UTC)

@Sonabi: It is not a mistake, it is done on purpose and is motivated by the release of arbitrary access. The Wikidata model only allows the inclusion of a wikipedia page only once. This is limiting in the sense that a protein page can contain a lot of gene information. Due to this one-page limit, a link to the appropriate Wikipedia page can't be set to the Wikidata item on that gene. However, with arbitrary access in place, gene information can be harvested from Wikidata items on genes to be used on Wikipedia pages on proteins. Moving the Wikipedia links from protein Wikidata items to the appropriate gene items, is needed to start using the arbitrary access in our workflows. Andrawaag (talk) 14:10, 9 October 2015 (UTC)
Yes, but the actually problem which I forgot to mention is that the bot is not transferring the entries of the other languages, while transferring the English entries. The result is that the connection between the related articles in the different languages is going to be lost. Or will the bot also move these entries in future? --Sonabi (talk) 14:53, 9 October 2015 (UTC)
@Sonabi: Its crucial for successful use of wikidata content within gene articles on all the Wikipedias that we make use of a stable data structure. After quite some discussions on wikidata 1 2 and on EN Wikipedia [3] , the community decided on a model where the structured information that would typically be consolidated on a single textual article is distributed in multiple, interlinked wikidata items corresponding to genes, proteins (or other gene products), and orthologous genes. To make use of that structure to build wikipedia infoboxes, we need the interwiki links to originate in the gene. During an initial import of wikipedia articles into wikidata items, many of the gene pages were brought over and classified as protein articles. These edits are correcting that and improving the data. Note that nothing is being lost. All the data in the protein items that aren't more appropriate for the gene (e.g. the gene expression images) are still there and can be reached from the gene via the encodes property, including the labels in the other languages. We now have two test articles up in EN Wikipedia that use this structure to compose their infoboxes directly from wikidata. See ARF6 and RREB1. It should be possible to re-use these patterns for any language Wikipedia, you would just need to set the interwiki link from the gene item to the correct page in that language wikipedia. We know what these should be in EN Wikipedia, but not necessarily in other language wikis. If you can help us identify the correct links in your languages, we could help with those mappings. --I9606 (talk) 16:10, 9 October 2015 (UTC)
@Sonabi: Just to emphasize that the connections are not lost. If a Wiki article is interwiki-linked to a protein data item and properties are moved from the protein item to a gene item, all of that information can still be retrieved by requesting it through the 'encoded by' connection between the protein and the gene. The conversion that Andrawaag is implementing is just adding content and giving it a better semantic structure that has been agreed upon by the wikidata molecular biology community.--I9606 (talk) 16:28, 9 October 2015 (UTC)

RE: Proteins and genes should not be merged[edit]

Hello. I didn't intend to merge proteins with genes, I'm well aware of their differences. It happened that I was reading Tafazzina on it.wiki and noticed that the same article do exist on en.wiki (Tafazzin), only unlinked, so I linked one with the other. Just compare the two links, these are the same protein - not the corresponding gene, called TAZ - so maybe the problem lies on Wikidata infos. But those are the same thing, that's for sure.Khruner (talk) 13:56, 17 November 2015 (UTC)
EDIT - For some reason the Italian article seems to encompass both the protein and the gene, so maybe the problem lies here. Khruner (talk) 14:02, 17 November 2015 (UTC)

Phenotype information[edit]

I am learning now how the pywikibot interface works. And then the first step that would seem interesting is to start adding taxon ids to most of the genomes as still many genomes do not have this information. How shall I begin with this? Make my own taxon script and dsmz script or should I merge this into the bot you have or are there other guidelines? I tried to find your email but was unsuccessful to that end.

--jjkoehorst (talk) 06:10, 8 January 2016 (UTC)

@Jjkoehorst: Our Bot is based on a python framework we call ProteinBoxBot. It basically is a 2 layer framework consisting of a core layer which takes care of communicating and dealing with different wikdiata issues (e.g. duplicate resolution etc). On top of this core layer - called PBB_Core - resource specific scripts are developed and maintained. @Sebotic has written a nice blog that might get your started. Extensive documentation is maintained on our projects Bitbucket repository. However, having the bot written is only part of the solution. We typically follow the following workflow with a new resource.
  1. Make sure the data license attached to the source allows distributing content on Wikidata (CC0)
  2. Check for existing records from your resource in Wikidata and make sure they are all correct and accurate
  3. Model 1 of 2 representative records from the resource under consideration
  4. During the modeling process it will become clear whether or not all needed properties do exist in Wikidata. If not, you need to propose the requires properties
  5. When all properties are in place either develop your bot or run your developed bot on your model items. These should not be broken by your bot.
  6. Run on 10 items
  7. Run on 100 items
  8. Run on 1000 items
  9. Once confident enough run on all.

I typically leave time in between the subsequent runs for possible issues to surface.

I am a bit hesitant to share my email address here, but if wanted you can reach me with a DM on twitter (handle: @andrawaag) --Andrawaag (talk) 16:42, 8 January 2016 (UTC)

bot deleting and altering data[edit]

Hello Andrawaag.

The bot User:ProteinBoxBot made an enormous amount of changes recently, amoung of these are the following: edits with the tag "wikisyntax", where it deletes most of the data (for example [3]) and in others following it, it changes the type from enzyme to protein family (for example [4]), where it doesn't seem to be correct. can you halt and verify this activity? Hummingbird (talk) 23:18, 11 January 2016 (UTC)

It even undid one of my edits: https://www.wikidata.org/w/index.php?title=Q21149193&diff=prev&oldid=291208902 -- numbermaniac (talk) 01:22, 12 January 2016 (UTC)
@Numbermaniac: Just to quickly replicate my reply also here: Sorry for that, I tried to move the interwiki link to a new item by undoing earlier changes, because the Wikipedia page deals with the enzyme class and not with the human specific type of this enzyme. Will create a new item manually.
@Hummingbird: Hi, sorry for the confusion, I would like to explain what is going on right now. In recent days, I took care to clean up the Wikidata data model for genes and diseases in order to align it with what was discussed in the Wikidata project molecular biology [5]. According to this discussion, interwiki links from Wikipedias should go to Wikidata gene items (subclass gene) and only if the topic is really only about the species-specific protein, the link should got to the protein. This data model is also required to be able to populate the Gene Wiki info boxes with our new 'info box gene' module in English Wikipedia [6] which will build the info box entirely from data fetched from Wikidata. See also our preprint here, explaining details: [7].
What specifically happened in recent days:
  • I merged all orphaned items which were "found in taxon": 'human' and 'subclass': 'protein' but did not have identifiers except their label, into Wikidata human gene items. This affected ~ 2,800 items and it also unified ~800 interwiki links of different Wikipedia languages on the human gene items. I did these merges via script supported manual curation, so most of them should be correct now.
  • There were also ~350 protein items which had interwiki links on them linking to Wikipedia Gene Wiki pages. Unfortunately, some of these links also went to enzyme classes or protein families, not to Gene Wiki pages. Some of these got hijacked recently and transformed the enzyme class/protein family items into human protein items. In the first case, I moved and unified the interwiki links onto the Wikidata human gene items. In the second case, I reverted the changes made earlier to reestablish the enzyme class/protein family. This is what you saw as deleted information in your example, but in total not more than ~100 items were affected by these deletions. The deleted protein information will be re-added to Wikidata as new items in the coming days and linked to the human genes accordingly. This seemed to be the best solution to untangle the protein family/human protein problem. For the enzyme classes and protein families, I will go through all of them and add the enzyme classification numbers and other useful information, so these can be used as subclass of and instance of values on Wikidata species specific protein items like human or mouse. I did these merges also by script assisted manual curation, so this should be quite reliable.
  • Gene ontology term cleanup: I also did extensive Gene Ontology term cleanup to remove wrong terms from Wikidata human and mouse protein items. You can see that because almost all constraint violations for Gene Ontology terms now are cleared [8][9][10][11]. In the coming days, we will also do a fresh import of proteins directly from Uniprot, also involving Gene ontology terms.
In summary, most Gene Wiki Wikipedia pages should now link to their correct Wikidata human gene item and most of the orphaned items, which would confuse users and do not make sense in a human gene/human protein and mouse gene/mouse protein data model, as described above, have now been unified and cleaned up. Except for the enzyme classes/protein families, no data has been deleted, and for those, we are about to re-add the data. I hope this gives you an overview of what I did, looking forward to your comments/suggestions. Best, Sebotic (talk) 07:34, 12 January 2016 (UTC)
Hello Sebotic. In the specific cases I had mentioned, it didn't make sense to me, so I just wanted to make sure that it wasn't a case of a bot that got out of control. As long as this is a planned action, I'm calm. thanks for reply. Hummingbird (talk) 10:19, 12 January 2016 (UTC)

Modification of items[edit]

Hello, I am transfering data from botulinum toxin (Q208413) to botulinum toxin type A (Q4095199) because there are different types of botolotoxin. botulinum toxin (Q208413) will focus on the general features of all toxins (type A to G) and botulinum toxin type A (Q4095199) will be focus on botolotoxin type A. I don't know how this can affect your bot about drug so please take care later if you are doing an update of the data. Thank you Snipre (talk) 13:36, 19 January 2016 (UTC)

@Snipre: Hi! Thanks, that's an important cleanup step to perform. ProteinBoxBot will not touch any item which does not have at least one of a set of unique core identifiers (either Drugbank ID, ChEBI, ChEMBL, Pubchem, UNII), so in the Botox case, it would only touch item Q4095199. If the identifiers cannot be mapped reliably, no data will be written, but a conflict will be logged for manual inspection of the item. In case no item with the appropriate identifiers can be found, a new item will be created.
I guess the Chembl ID on the general botox item Q208413 should also be transferred to Botox A or deleted? I did a similar cleanup recently, cleaning up the generalized topic of Vitamin B and the actual chemical compound Cyanocobalamin. What also seems to require a lot of cleanup is stereoisomeric compounds e.g. for amino acids and sugars. Very frequently, I see Wikidata items containing a mix of identifiers for all 3 possible cases (e.g. the L-, D- and DL-mix forms). We will not come around manual cleanup here, I think. Best Sebotic (talk) 20:10, 19 January 2016 (UTC)
Sebotic I cleaned the general item about Botulinum Toxin so you won't find any identifier about the type A there. Snipre (talk) 20:43, 19 January 2016 (UTC)
By the way can have a look at these 2 items, Neurotensin (Q419576) and NTS (Q14904891) ? One is the gene and one is the protein but they have the same PubChem ID 16129680. I have some trouble to define what is the correct molecule. Thanks. Snipre (talk) 21:01, 19 January 2016 (UTC)
@Snipre: I fixed that one, in addition to the fact that the PubChem ID was outdated, a Pubchem ID should certainly not be on a Wikidata gene item. This info was added by KrBot, which seems to take data from Wikipedia info boxes and add these to Wikidata. I am not sure if these kinds of imports are really useful anymore for domains like genes/proteins or drugs where we do the imports from primary, authoritative databases. Thx! Sebotic (talk) 22:03, 19 January 2016 (UTC)

Update item Q19856779[edit]

Hello, I transfered some data from mitomycin (Q417625) to mitomycin (Q19856779) but without the reference. Please add that item in your next update session. A added only the identifiers used to extract the other ones from external databases. Snipre (talk) 20:23, 24 January 2016 (UTC)

Please check if these items can be merged[edit]

Hello, can you check if your bot didn't create duplicates for the following cases:

Thanks Snipre (talk) 22:08, 2 February 2016 (UTC)

@Snipre: Hi! The bot can create duplicate items on purpose under clearly defined circumstances. The bot searches for items which have a certain set of unique IDs (Drugbank, Pubchem, ChEMBL, CHEBI, KEGG, Inchi_key). If it does not find an existing item on that basis, it creates a new item. This is what happened here. Item Ephedra (Q20817199) got created on 13th August 2015, but item Ephedra (Q13530468) only got added the Drugbank ID on 13th October, so before 13th October item Ephedra (Q13530468) could not have been recognized as the appropriate item by my bot. This is done, because matching labels or aliases can cause extensive problems by writing to the wrong item(s). For data consistency, it is better to create a new item and merge it later on, than to produce Wikidata items with the wrong data on them. Should at some point both items have one of the IDs mentioned above, they will be detected by my bot. You can either merge these items or I will do it in a few days, so you have time to recapitulate how these duplicate items came into existence. Sebotic (talk) 23:06, 2 February 2016 (UTC)
@Sebotic: Ok, thanks for the explanation. For Ephedra we can keep both items separated, one for the drink (medicinal preparation and one for the molecule. But I didn't find a clear definition of the molecule so that why I am wondering if the molecule really exists. For me DrugBank is not clear about that the difference mixture of molecules and one molecule. Snipre (talk) 08:20, 3 February 2016 (UTC)

P2888[edit]

exact match (P2888) is ready. --Tobias1984 (talk) 19:28, 5 June 2016 (UTC)

Problems with ProteinBoxBot[edit]

Hello, Andrawaag. I've just described a problem with ProteinBoxBot on its talk page. (It's topic #28). I don't know how often you check that page, but perhaps you'd like to take a look. Thanks for your attention. Akhooha (talk) 20:22, 4 July 2016 (UTC)

Hello, Akhooha I have just responded to you on that page --Andrawaag (talk) 20:36, 4 July 2016 (UTC)

Share your experience and feedback as a Wikimedian in this global survey[edit]

  1. This survey is primarily meant to get feedback on the Wikimedia Foundation's current work, not long-term strategy.
  2. Legal stuff: No purchase necessary. Must be the age of majority to participate. Sponsored by the Wikimedia Foundation located at 149 New Montgomery, San Francisco, CA, USA, 94105. Ends January 31, 2017. Void where prohibited. Click here for contest rules.

Unused properties[edit]

This is a kind reminder that the following properties were created more than six months ago: MGI Gene Symbol (P2394), UCSC Genome Browser assembly ID (P2576). As of today, these properties are used on less than five items. As the proposer of these properties you probably want to change the unfortunate situation by adding a few statements to items. --Pasleim (talk) 19:08, 17 January 2017 (UTC)

@Pasleim: Thanks for the reminder. MGI Gene Symbol (P2394) has been added to the workflow, and contains more items now. UCSC Genome Browser assembly ID (P2576) points to reference genomes. For now we only cover 4. We plan to extent in the near future, but for now I hope it is okay to have this small number of items. --Andrawaag (talk) 17:54, 19 January 2017 (UTC)

Your feedback matters: Final reminder to take the global Wikimedia survey[edit]

(Sorry to write in Engilsh)

Disambiguation pages standing in Dutch election[edit]

Great work on adding the election candidates. However, there are a few places where you've marked the details on a disambiguation page rather than the actual candidate:

SELECT ?item ?itemLabel
{
  ?item wdt:P3602 ?election; wdt:P31 wd:Q4167410 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "nl" . }
}

Try it!

--Oravrattas (talk) 07:06, 5 March 2017 (UTC)

Thanks, for reporting these. They are fixed now.

--Andrawaag (talk) 14:15, 5 March 2017 (UTC)

Descriptions[edit]

Hoi Andrawaag, descriptions zijn niet het begin van een zin en dus normaal niet met een hoofdletter. Voor items zoals Jan Baas (Q28872044) kan je waarschijnlijk beter iets als "Nederlands politicus" doen. Multichill (talk) 18:02, 13 March 2017 (UTC)

Ik heb een overzicht op User:Sjoerddebruin/Dutch politics/Tweede Kamerverkiezingen 2017 gezet. Begint er goed uit te zien! Multichill (talk) 19:54, 13 March 2017 (UTC)
En mocht je de smaak te pakken hebben: 2012. Multichill (talk) 20:31, 13 March 2017 (UTC)
@Multichill: Eerlijk gezegd heb ik de smaak best te pakken. Maar misschien is het interessanter om eerst de uitslag per kandidaat toe te voegen, zodra deze beschikbaar zijn. Daarnaast moet ik ook nog wat tijd vinden. Nu is het relatief arbeidsintensief omdat ik alles via Quickstatements doe. De bot accounts waar ik toegang toe heb, hebben geen task permissie voor verkiezingsdata. Dat gezegd hebbende, zullen we samen een bot account aanmaken, specifieke voor verkiezingsdata? Het zou sowieso interessant zijn om te zien of de Wikidata integrator platform die we nu gebruiken rondom genen, eiwitten en ziektes ook generiek toepasbaar is op andere domeinen. --Andrawaag (talk) 21:59, 14 March 2017 (UTC)
Politici die landelijk actief zijn in Nederland is een domein waar al behoorlijk wat werk is verzet. Daarom had ik het ook onder User:Sjoerddebruin/Dutch politics gehangen.
Je kan een bot account aanmaken voor je eigen projectjes zoals bijvoorbeeld de politici. Ik heb ondertussen geloof ik wel een dozijn bots voor allerlei verschillende taken.
Ik ben zelf in allerlei domeinen actief, maar de laatste tijd voornamelijk schilderijen. Daar hebben we er ondertussen ook al meer dan 200.000 van :-) Multichill (talk) 17:57, 15 March 2017 (UTC)
Heb jij een handige manier op [12] aan de juiste persoon toe te voegen? Multichill (talk) 15:16, 1 April 2017 (UTC)

affirmé dans : Banque-Carrefour des Entreprises[edit]

Cette référence manque de précision : ouvrage avec page, url ? --Jmh2o (talk) 13:45, 14 May 2017 (UTC)

Tout a fait.J'ai ajouté le lien a Q16626729.--Andrawaag (talk) 10:33, 15 May 2017 (UTC)

any reason to use P31 and P279 for the same value?[edit]

tinyurl.com/lardh6y

Is true that every Q14912958-gene is indistinguishable from other Q14912958-gene?

I would claim grain of sand as class (P279 of some other class), even we cannot distinguish wast majority of them. Because we can isolate any grain and claim P31 grain of sand. d1g (talk) 10:06, 15 May 2017 (UTC)

biological pathway[edit]

I'm not sure: can we merge no label (Q28864279) (that you created) and biological pathway (Q4915012)? Thank you, Tubezlob (🙋) 19:49, 16 May 2017 (UTC)

Belgian post code[edit]

Hi,

There is a discussion right now on Wikidata:Bistro (Topic:Trdng8v51r0sq785) about the unusual duplicates adding of post code you made on several belgians communes. Could you explain why you did this (in French or in English, as you prefer).

Cdlt, VIGNERON (talk) 11:20, 28 May 2017 (UTC)

Replacing values and descriptions[edit]

I'm not sure if I'm happy with edits like Special:Diff/528268556 and Special:Diff/528268773.

  • You're replacing a valid French label with a screaming label
  • The new descriptions lack capital letters
  • You're deleting valid P31 values
  • You're adding coordinates as seperate statements on companies, while those should be qualifications of headquarters location
  • The date of inception differs, still you delete the current value instead of adding another one

Sjoerd de Bruin (talk) 14:21, 29 July 2017 (UTC)

I also think that the organization for example behind a music festival should have its own item. Sjoerd de Bruin (talk) 14:22, 29 July 2017 (UTC)

User:Sjoerddebruin Thanks for noticing. ]]

Indeed, this is the official name as they have registered, but that is not an excuse. I will decapitalise here.

In both cases the values were only deleted if a proper reference is missing, the inception dates now added are the official inception date according to governmental data. My reasoning was, that once you have a official inception date, you can delete the non-referenced one.
Is there a policy here? It is not always a headquarter location. Per Belgian Enterprise Number there can be multiple coordinates applicable. I was actually considering all as a list under P625
I might agree with you here. The thing I did in this exercise, was adding additional properties from the Crossroadsbank of enterprises, based on already entered enterprise numbers. So the festival already had a enterprise number. In these case the I would argue that the for the festival page itself the enterprise ID would be removed and a link would be made to the organisation on a separate WD item  – The preceding unsigned comment was added by Andrawaag (talk • contribs).

Same for Vrije Universiteit Brussel (Q612665) or Federal Public Service Budget and Management Control (Q636971) etc.

Please fix all these errors in all your edits and next time please consider to consul such import at Wikidata:Project chat or Wikidata:WikiProject Companies talk.Jklamo (talk) 09:58, 1 August 2017 (UTC)

@Jklamo: Yes I have stopped overwriting non-referenced p31 statements. I disagree that those are valid P31 statement, because it is hard to see if a P31 statement is a valid statement if there is no reference added to back that claim. Having said that, as you righteously mention, the business enterprise P31 statement I added, also does not make sense, since non commercial enterprises do exist in the source. So I am not in a position to strongly disagree. I need to figure out a better approach here. So yes I will leave the P31 statements in place.
Can you point me to where the headquarter - coordinates pair was decided. I am wondering is this can be discussed. This pattern in it self is more error prone then using direct coordinates as statement, since the next to geocoding - records typically use strings to describe places - there is an additional step to resolve the location name to its Wikidata identifier. Locations can have multiple names and multiple locations can have the same name.. So having to resolve the identifiers after geocoding introduce an additional step the might lead to duplicates being generated. I am currently figuring out if this can be solved using SPARQL, but that only works if for all locations additional properties such as zipcodes are available.
Either way I will consult the project chat on my next attempts, this was not so much a import as it was to try to make sense of all statements in Wikidata that do have statements with [Belgian Enterprise Number (P3376)] --Andrawaag (talk) 19:12, 1 August 2017 (UTC)


  • This looks like item re-purposing with a misleading edit summary. Please stop both.
    --- Jura 12:47, 1 August 2017 (UTC)

Double WikidataID[edit]

Hi Andra,

Let me introduce myself: I'm a PhD student of Egon (no last name needed I think ;)....) Today, I found for entry Q24745328 a duplicate, Q31202267. Are you okay with removing the latter? And, small, question, is it possible in Wikidata to flag duplicates?

Kind regards,

Denise.

Self-referencing[edit]

Hi,

Your bot add two self-referenced statement on Actrapid (Q2034113)

Could you look at it and correct it, if possible.

Cdlt, VIGNERON (talk) 13:17, 5 August 2017 (UTC)

BCE/KBO[edit]

Hello. I have noticed that you have described Q29576195 as a business enterprise (Q4830453) because it is present in KBO/BCE. Imho, this is not correct. Q29576195 is an ASBL/VZW and can be primarily described as a research centre. More generally, the fact that an entity is present in KBO/BCE (even if the O means ondernemingen) does not mean that it is a business enterprise, because all the Belgian legal entities are present in there. For example: Federale Overheidsdienst Financiën, KUL, RVA..., which are not business entreprises. Best regards, BrightRaven (talk) 09:31, 9 August 2017 (UTC)

It seems that you have made all those changes with a bot. This is really not good. Why have you removed all statements in Property:P31 and replaced them by Q4830453? This makes no sense: Q313966 is a business enterprise OK, but it is better described as a bank, as it was before your action. Same for Q3512080, better described as a transport company and a government-owned corporation. Same thing for the label: the official name in KBO/BCE is not always the usual name. There was no reason for this massive, automatic action. Please avoid such things in the future. I have added all those Belgian enterprise number in those items. I regret this has allowed you to erase a lot of correct useful data. BrightRaven (talk) 10:21, 9 August 2017 (UTC)
@BrightRaven: I agree with you that stating something to be a business enterprise because it is mentioned in the KBO is inaccurate. I have stopped my efforts until I have a more accurate way of describing what is stored in the KBO. This requires some subsequent analysis. I have not deleted all the P31 statement, only those were overwritten if the original record did not have a reference because there is no way of knowing if that statement is accurate. I understand that is frowned upon, so in next efforts, I will leave them in tact. Having said that in the long term I do intend to start a discussion on how to deal with reference less statements. The do provide some noise, because there is no way in validating whether such a statement is accurate. So in the long term, I would argue to overwrite P31 statements if no proper reference is given, provided the replacement does. As long as we did not have this discussion, I will maintain reference less statements on p31 --Andrawaag (talk) 10:32, 9 August 2017 (UTC)
Even the European Parliament had become a business company. This is really good referenced data! More seriously, Property:P31 is often difficult to reference. In most case, this must be done on case-by-case basis, in reviewing sources (often the sources of the Wikipedia articles). Except in limited cases, it cannot be done automatically with a bot. BrightRaven (talk) 10:38, 9 August 2017 (UTC)

The automatic replacement of the labels is also wrong. Sometimes it is fully capitalized, as Sjoerd de Bruin noticed here above, but there are other problems: your action on SONACA item had the consequence that this item was impossible to find by writing "SONACA" (the most common name of this company) in French ("SONACA" was the label, you erased the label with the official name, so there was no "SONACA" any more in French in the labels or aliases). Moreover, the official name is not always the most common name. The subsidiaries of SRWT are rarely named by their official names (see [13] for example). Same for Filigranes: the official name is never used in communication to the public. Please review Help:Label: "The label is the most common name that the item would be known by." So do not replace any more the labels with the official names from KBO/BCE. This can be done only on case-by-case basis. (However, I think it could be added as an alias automatically.) (And labels do not have to be referenced.) BrightRaven (talk) 10:54, 9 August 2017 (UTC)

Wikidata:Requests for deletions#Q28031601[edit]

Hey Andrawaag, can you please comment in Wikidata:Requests for deletions#Q28031601? Thanks, MisterSynergy (talk) 05:46, 17 August 2017 (UTC)

Property for Identifiers.org[edit]

Hi, I think you might be interested in this topic, and I'd like to know your views. Thanks. https://wikidata.org/wiki/Wikidata:Property_proposal/Identifiers.org_namespace --~~ Yayamamo (talk) 01:47, 26 August 2017 (UTC)

Description of excessive length[edit]

Hi, I keep reverting the edit of your ProteinBoxBot for the umpteenth time. Can you please train it not to add excessively long descriptions? According to Help:Description these bits of information should be short, there is no need to describe the whole chemical process in this field. Thanks. Csigabi (talk) 12:28, 2 October 2017 (UTC)

It would be very polite if the bot refrains from replacing descriptions and labels. Sjoerd de Bruin (talk) 13:52, 2 October 2017 (UTC)
The bot is adapted to prevent this in the future: [14] --Andrawaag (talk) 14:49, 3 October 2017 (UTC)

ProteinBoxBot uses of as (DEPRECATED) (P794)[edit]

Hello there. Your ProteinBoxBot created large numbers of items with statements in the format regulates (molecular biology) (P128)  biological process (Q2996394) / as (DEPRECATED) (P794)negative regulation of biological process (Q22260640) which contains the qualifier as (DEPRECATED) (P794) that we are trying to deprecate due to untranslatability. Your input will be most welcome at this discussion. Deryck Chan (talk) 18:07, 2 November 2017 (UTC)