User talk:Harej

From Wikidata
(Redirected from User talk:Harej (WMF))
Jump to navigation Jump to search
Logo of Wikidata

Welcome to Wikidata, Harej!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

  • Introduction – An introduction to the project.
  • Wikidata tours – Interactive tutorials to show you how Wikidata works.
  • Community portal – The portal for community members.
  • User options – including the 'Babel' extension, to set your language preferences.
  • Contents – The main help page for editing and using the site.
  • Project chat – Discussions about the project.
  • Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, please ask me on my talk page. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards!

--ValterVB (talk) 21:56, 24 February 2014 (UTC)

Recent mass creation[edit]

I recently created some 1,773 items for WikiProjects on English Wikipedia. This helps fill an infrastructural role in finding WikiProjects and coordinating across different projects. I have created each item and have associated them with their respective enwiki pages; I am prepared to fill in labels, descriptions, etc. but it is late so I will resume this work in the morning. Harej (talk) 09:57, 24 December 2015 (UTC)

When changing a label ...[edit]

When you feel inclined to change the label of a property or item, I would be appreciative if you would as a minimum put the old label as an alternative. It is a right PITA to have known labels disappear and have to go hunting. Thanks.  — billinghurst sDrewth 11:40, 26 December 2015 (UTC)

To also note that the scans are broader than documents, they are books, music, papers, records, documents ... so we need a more generic label.  — billinghurst sDrewth 11:43, 26 December 2015 (UTC)
Not a problem Billinghurst, I will make sure to put the old label as an alias from now on. Also, I would consider books, sheet music, records, etc. to be forms of documents (with "document" being a broad umbrella term) though I imagine others may disagree. Harej (talk) 04:08, 27 December 2015 (UTC)
/me shrugs finding the sweet spot is a beast. wikt:document  — billinghurst sDrewth 14:38, 28 December 2015 (UTC)

film duration (minutes) as input via quick statements[edit]

Via Project chat I am reaching out to ask if you solved this, b/c I ran into it and heard it was not (yet) available. Any news? --Jane023 (talk) 12:32, 25 February 2016 (UTC)

Wikidata:Database reports/Constraint violations/P2093[edit]

Hi Harej,

The other day I added a format constraint to the property. It should detect incomplete entries. What do you think?
--- Jura 05:37, 28 May 2016 (UTC)

Published in[edit]

Hi, Harej!. Sincerely, I don't see the point in edits like these. Revista de Indias published in Revista de Indias (?) I think this property was intended to use with articles, as "Title of an article in Revista de Indias" published in Revista de Indias. Cheers and thanks for your edits! Strakhov (talk) 21:13, 10 September 2016 (UTC)

Strakhov, most of the edits are for stating that journal articles are published in journals. Those I would hope you agree are useful. This, however, is an edge case I appear to not have accounted for. I will undo this particular one; I can use SPARQL to find/remove the other ones. Harej (talk) 21:15, 10 September 2016 (UTC)
Of course those ones are useful. :) Thanks! Strakhov (talk) 21:17, 10 September 2016 (UTC)
It appears to have happened again. However, I have since added a check that should prevent it from happening again. Harej (talk) 00:55, 11 September 2016 (UTC)

Flooding[edit]

When making a very large amount of changes very fast, please consider using a flood flag or a bot. Patrolling becomes very difficult otherwise. --Yair rand (talk) 18:12, 6 November 2016 (UTC)

Yair rand, sounds reasonable. I have now made the request. Harej (talk) 18:22, 6 November 2016 (UTC)

publication date[edit]

During the last months you created items for many publications. Unfortunately, you added often (~2500 times) "bad values" for publication date (P577), i.e. you added only a year but precision set to month. An example is Q28088794 where publication date (P577) is set to "+2015-00-00T00:00:00Z" with precision month. A full list of errors you find on [1]. --Pasleim (talk) 15:52, 3 January 2017 (UTC)

Pasleim, thank you for bringing this to my attention. I will see what I can do to fix the existing bad values and prevent it from happening in future instances. Harej (talk) 16:24, 3 January 2017 (UTC)

Double entry on article[edit]

no label (Q28210548) and no label (Q28210548)? and another one: Dynamic causal modelling (Q28160135) and [2]. You might want to check your bot — Finn Årup Nielsen (fnielsen) (talk) 10:10, 12 January 2017 (UTC)

Yet one: no label (Q28204947) and Dyslexia: cultural diversity and biological unity (Q28175733)Finn Årup Nielsen (fnielsen) (talk) 11:24, 12 January 2017 (UTC)
no label (Q28202818) and The role of working memory in visual selective attention (Q28161576). I think you got somewhat of a problem here. — Finn Årup Nielsen (fnielsen) (talk) 11:31, 12 January 2017 (UTC)
Fnielsen, thank you for bringing this to my attention. Given the timestamps it's not the earlier cases of duplication caused by browser issues I was having, so there must be duplication finding its way into the dataset. I will need to investigate further (and possibly re-do the workflow.) Harej (talk) 18:11, 12 January 2017 (UTC)
Ooof. This is... not good. I am going to hold off on creating entries for now and work on merges. Harej (talk) 18:47, 12 January 2017 (UTC)

Epic merger underway. I am sorry this happened. My strategy for creating items clearly did not work so I am dropping it and will investigate different options. Harej (talk) 19:07, 12 January 2017 (UTC)

Thanks for your work. Sorry to hear about the duplication. I wonder if there is anyway I can help. The duplications I have seen has all had a PMID (and not a DOI - if I remember correctly). The constraint report says 3499 at the moment Wikidata:Database reports/Constraint violations/P698. There has been a large expansion of the constraint report on the 12 January 2017. — Finn Årup Nielsen (fnielsen) (talk) 23:41, 12 January 2017 (UTC)
I see my first example was confusing (it was the same item). It should have been Dynamic representations and generative models of brain function (Q28181164) and no label (Q28210548). — Finn Årup Nielsen (fnielsen) (talk) 23:45, 12 January 2017 (UTC)
Fnielsen, it was confusing. :) Thank you for clarifying.
According to the Wikidata Query Service there were over 20,000(!!!) duplicates. I currently have Quick Statements working on merging duplicated entries together; it should be done in a few hours. Right now I am working on creating entries based on the latest data on scholarly articles cited on Wikipedia, and there's a lot of them. However, my attempt to scale up my Quick Statements work seems to have gone awry. Despite de-duplicating the identifiers from the original dataset, it is possible that duplicate Quick Statements commands made it in. Which illustrates the fundamental problem of trying to make over a million edits that way – it's too inefficient and error prone once you're doing more than a handful. I guess it's time for me to get to work on making a proper bot. Harej (talk) 23:56, 12 January 2017 (UTC)

main theme links to journal instead of topic[edit]

I seen that sometimes a main theme gets linked to a journal instead of a topic when then share similar name. There is one case for you here [3]. I suspect it might be an edit in connection with Magnus Manske's tools? I am not sure what is the best way to warn editors about the issue. — Finn Årup Nielsen (fnielsen) (talk) 17:10, 17 January 2017 (UTC)

Fnielsen, it is a known issue with Magnus Manske's SourceMD tool. This SPARQL query identifies existing cases of the problem. I should go through and fix those. Harej (talk) 17:14, 17 January 2017 (UTC)
Yeah, there seems to be loads of issues coming yesterday. — Finn Årup Nielsen (fnielsen) (talk) 17:17, 17 January 2017 (UTC)
That's because topics are given as free text, so I do I Wikidata text search for the topic text. If that's "Infectious Diseases", the journal might be the first result from the Wikidata API. --Magnus Manske (talk) 13:56, 19 January 2017 (UTC)

Missing author in a paper[edit]

I found two missing authors on A functional neuroanatomy of hallucinations in schizophrenia (Q28283938), see [4] from a quickstatement insert last Friday. I wonder if there is an issue in Magnus' or your tools somewhere? — Finn Årup Nielsen (fnielsen) (talk) 10:18, 19 January 2017 (UTC)

PubmedID unique?[edit]

As I found some doubles. Wouldn't it be helpful to have a constraint report, and/or a tool to merge items with the same title and pubmedID? Edoderoo (talk) 19:33, 11 February 2017 (UTC)

Item to be delete[edit]

In RFD there are one or more item proposed for the deletion created by you. If you do not agree you can participate in the debate --ValterVB (talk) 21:23, 3 March 2017 (UTC)

Duplicate efforts by you and Daniel Mietchen[edit]

Hi James,

by merging Complete Genome Sequence of an Aerobic Hyper-thermophilic Crenarchaeon, Aeropyrum pernix K1 (Q22066070) you've created duplicate statements for author, title and pages. May you please check, whether this error has also occurred at other items about publications?--TIB-NOA (talk) 11:28, 29 March 2017 (UTC)

TIB-NOA, I have been looking into it and have noticed that much of the duplication comes from me using DOIs as a starting point while Daniel Mietchen uses PMCIDs. I will be working on a solution to this problem that involves proactive comparisons between databases. Harej (talk) 11:40, 11 April 2017 (UTC)

Duplicates[edit]

Hello Harej, I noticed that you have created some duplicate entries, see for instance here for an example I found at Wikidata:Database reports/Constraint violations/P236. Perhaps you could take care to avoid these. But another one is more tricky. It seems that the ISSN mentioned at Understanding Complex Systems (Q29043656) does not belong to the same title as the item gives. Could you look into that? Thank you. Kind regards, Lymantria (talk) 17:09, 29 March 2017 (UTC)

Lymantria, I will work on consolidating the duplicate entries. As I responded above, I will be working on a system to proactively compare the different research databases. The discrepancy between the name of the item and what the ISSN corresponds to is interesting. These entries are typically programmatically generated based on Worldcat data; I will look and see what the correct journal title should be. Harej (talk) 11:42, 11 April 2017 (UTC)
In the meantime, I just changed the label for the item, since it doesn't appear that anything linked to the item anyway. Harej (talk) 11:43, 11 April 2017 (UTC)

HTML-Tags in label[edit]

E.g. See Redescription of the eagle rays Myliobatis hamlyni Ogilby, 1911 and M. tobijei Bleeker, 1854 (Myliobatiformes: Myliobatidae) from the East Indo-West Pacific (Q29397209). This is a well know, old error and easily avoidable. --Succu (talk) 14:09, 17 April 2017 (UTC)

Succu, turns out I was filtering out HTML entities for the title property but not the label. Ouch! I will work on cleaning it up. Harej (talk) 19:58, 22 April 2017 (UTC)
Yep. ;) Simply use the title as the label, but be aware of different lenght limitations. Regards --Succu (talk) 21:03, 22 April 2017 (UTC)

OpenCitations and introduction of errors[edit]

In my work integrating the OpenCitations Corpus I have noticed errors, including adding inappropriate identifiers for articles and DOIs with formatting errors. I have noticed these errors and will fix them once the script is finished running. I will also inform the maintainers of the data source of the errors I discovered so they won't appear in future runs. Harej (talk) 17:57, 29 May 2017 (UTC)

= Admin noticeboard[edit]

Please note the discussion about your edit rate.
--- Jura 16:41, 23 August 2017 (UTC)

Slow down[edit]

Hey, Your bot is editing with 250 edits per minute, it's making problems for the infrastructure. Slow it down to 60 per minute please or I need to block you. Amir (talk) 12:06, 13 September 2017 (UTC)

Amir, to confirm what I posted on IRC, I have stopped the edits for now and will throttle to run at a lower speed. Harej (talk) 12:20, 13 September 2017 (UTC)
Thank you. Amir (talk) 12:25, 13 September 2017 (UTC)

I just blocked you, you are not slowing down. It was 500 edits in two minutes. Please keep it below 60 / minute. Amir (talk) 11:31, 2 October 2017 (UTC)

DOI.org query endpoint[edit]

Hello, I stumbled across an item you created based on information from the DOI query endpoint Cheshire Castles of the Irish Sea Cultural Zone (Q29037779) View with Reasonator View with SQID. I'm interested in adding some bibliographic information on archaeological journals to Wikidata, but have been processing a single DOI at a time – I'm almost certain that's not the smartest way of doing it. Would you know of a workflow which speeds up the process? Richard Nevell (talk) 11:03, 5 November 2017 (UTC)

Richard Nevell, I created a Python application for bulk-creating Wikidata items. There's also WikidataIntegrator which is used by my BiblioWikidata tool but also has its own tool for creating entries from PMIDs. There's also Fatameh, which uses PMIDs and PMCIDs. Harej (talk) 01:44, 8 November 2017 (UTC)

Thanks for your "updating citation graph"[edit]

@Harej: I see you have added many citation statements for scientific articles and I thank you very much for all your effort. I was wondering if you plan to continue this and to add the missing citation statements for other articles. Mahdimoqri (talk) 01:30, 21 March 2018 (UTC)

Mahdimoqri, I've been busy with many other things over the past... year. So I've fallen well behind on this project (and in responding to talk page messages as you can see). One of the issues I had run into is that Wikidata simply grew too big for my usual strategies to work. So I had to come up with some other ones. I think I was about halfway through my work before being distracted. However this weekend I have some time and I hope to make some more progress. Harej (talk) 10:54, 19 May 2018 (UTC)

A new elasmosaurid from the early Maastrichtian of Angola and the implications of girdle morphology on swimming style in plesiosaurs[edit]

Hi, James. I'm interested in creating Wikidata items related to publications in paleontology. Before I saw that you had created an item for the paper in the title of this section I created items for each of its authors. Then I saw your item for the paper used some kind of string format rather than referring to items for the authors. Was it a bad idea for me to have created those items or should the current item for the paper be reformatted to refer to them for authorship data? I'm not very experienced here on Wikidata and I was hoping you could offer some guidance on how to handle situations like this and about handling authorship more generally. Abyssal (talk) 17:25, 7 May 2018 (UTC)

Abyssal, the main thing is that when creating items for the author (P50), you have to know that a given person with a given name is that person with that name, and not some other person who happens to have the same name. Usually this is achieved by associating the authorship of a paper with something like an ORCID iD (Q51044). In other words, it's important to know who the author is, and not just what their name is. That said if you just did it for a few articles it's probably not the end of the world, but not something I would want to do on every Wikidata item about journal articles. Harej (talk) 10:56, 19 May 2018 (UTC)

Strange item[edit]

Hi,

Something went wrong on the creation of no label (Q32979723). Could you take a look?

Cdlt, VIGNERON (talk) 16:55, 27 June 2018 (UTC)

Looks strange, VIGNERON. I can't figure out what the "correct" ISBN is, so I removed the one inbound link to the item and I recommend that it be deleted. Harej (talk) 21:02, 27 June 2018 (UTC)

DOI Cleanup[edit]

I don't understand the rationale behind the recent DOI cleanup started here: https://tools.wmflabs.org/quickstatements/#/batch/4246. Have you documented it somewhere? John Samuel 08:20, 8 September 2018 (UTC)

GerardM, Jsamwrites, sorry for the confusion. Quickstatements Bot will be re-adding DOIs for each entry it's removing it from, but with capitalized letters and HTML entities converted into regular characters. Next time I will have it add then remove. Harej (talk) 16:38, 8 September 2018 (UTC)
Why have all capitals when the papers themselves have it in undercase? Thanks, GerardM (talk) 16:42, 8 September 2018 (UTC)

Removal of DOI[edit]

Why remove perfectly functional DOI? Thanks, GerardM (talk) 13:35, 8 September 2018 (UTC)