Topic on User talk:Magnus Manske

Jump to navigation Jump to search

Bad QuickStatements edits with no attribution

23
Bovlb (talkcontribs)

This edit conflated two people, not only adding a second ORCID to an item that already had one, but one which was already on a distinct item. I would complain to the person who issued the QuickStatements batch, but I don't know how to determine who that was. Cheers!

Bovlb (talkcontribs)
Bovlb (talkcontribs)

Q64867660 is another case where an item was created with an ORCID already on an existing item, where the author and intention of the QuickStatements batch is not apparent.

Bovlb (talkcontribs)

Q64860817 and Q64860820 are a case of items created with the same ORCID within a few seconds of each other, again with no way to track the author.

Bovlb (talkcontribs)

I have consulted the RFP for this bot, and it clearly states that it will have "both batch and submitting user indicated and linked in the edit comment". Recent edits appear to violate this. I note that I brought this to your attention three months ago. I would appreciate some feedback from you, as the creation of duplicate items is disruptive, and this lack of attribution prevents an effective response.

Magnus Manske (talkcontribs)

I have deactivated SourceMD for now, until I can figure out what's wrong.

GerardM (talkcontribs)

What I have noticed fixing authors is that the number of these instances have largely gone away. Particularly after the latest update of the software.. You may argue that it is disruptive when duplicate records are created. The loss of this service is much more disruptive.. Yes, I do merge duplicate records..

Thanks,

Bovlb (talkcontribs)

GerardM: Please feel free to work on the backlog of ORCID duplicates in the meantime. https://w.wiki/5ZJ

GerardM (talkcontribs)

There are regular reports on the ORCID duplicates.. check them and you will see my handiwork.. I do not do query

Bovlb (talkcontribs)

I'm still seeing cases within the last month of bad QuickStatements changes that have no useful edit summary, and hence no way to track down who is making the errors. I am surprised that this could possibly be intended or permitted. Looking at the most recent bot changes from today, I see many entries with the summary #quickstatements; invoked by SourceMD:ORCIDator. Referring to Wikidata:Requests_for_permissions/Bot/QuickStatementsBot, I see an undertaking that both batch and submitting user [are] indicated and linked in the edit comment, which suggests that this bot is in violation of its request for permissions. This appears to be a long-standing problem that has been brought to your attention repeatedly. Can we please put an end to this?

GerardM (talkcontribs)

Hoi, so what is the problem? It is the most important tool available to link people and papers in Wikidata based on information from ORCID. Its use is necessary because it provides primary data leading to Scholia presentation. They tend to be good. When you have a statement where you indicate that it is problematic, the question is why did this come up in the first place where is the data in error for this to happen. Given the quality of our runtime environment, it is easy to notice that edits get refused for spurious time outs and consequently it is not a given that processed data ends up as intended in the database. For me it is not obvious that you are not throwing out the baby with the washing water. Thanks, but no thanks

Bovlb (talkcontribs)

@GerardM: Specifically, the problem is that bad edits like the diff linked above cannot be traced back to find who is making the bad edits and why. This means that we lack an essential tool for improving our process. It is far better to educate editors (and fix tools) so that we introduce fewer errors in the future, rather than merely find ways to detect and fix errors after the fact. I have found (and fixed) many such cases of incorrect assignment of Scopus Author ID (P1153), but I have been unable to find out who is making these errors, how, and why.

More generally, it is not good practice for bots to fail to do the things that were promised in their request for permissions. Circumstances change, of course, and we don't need to be over-pedantic on this technical point, but in this case the promised behaviour is clearly highly desirable, and it is a mystery to me how we apparently came to drop it.

GerardM (talkcontribs)

Hoi surprise, data can be dirty and we are talking about data not at our end but at the end of Scopus (an organisation that does not care about us) OCLC, VIAF including all the library authorities of this world and ORCID. So it is not bots fail, it is the data fails us.

Now here is something to consider, how can we be the place where authorities come together if we do not take the data warts and all. The desired behaviour is a pipe dream when at the same time you want to accomplish data that is meaningful, worthwhile. Given that you are professional at data (as per your user page) you should understand this well.

As to data cleaning, I merge quite a number of items. For me the key thing is that with more data merged, chances of keeping the data clean improve. The interoperability of data improves.

The notion that we should stay away from datasources is absolutely painful. We have lost years in not accepting data that is/was no beter than the data we have/had. For me the this notion that we can build Wikidata and keep it clean is false.


Bovlb (talkcontribs)
The notion that we should stay away from datasources is absolutely painful.

Could you please explain how your response is related to the issue I am raising?

GerardM (talkcontribs)

SourceMD takes info from sources like ORCID and assumes that the data is fine. Typically it is. The notion that SourceMD is banned because of errors elsewhere is for me absolutely painful. It is a tradition, we have scorned the data from Freebase and many others. The arguments are based on single item quality not quality of sets and subsets.

Bovlb (talkcontribs)

Thanks for responding. I am still confused about how your point relates to this issue. I am not trying to ban SourceMD.

What I am seeking is that, when QuickStatementsBot acts on behalf of a user, that user is identified in the edit summary. Not only does this seem like a reasonable request, but it also appears to be promised in the bot's RFP. Edits should be assignable. Either the bot author is taking on responsibility for these edits, or they should be attributed to another editor. So far as I can tell from the RFP, QuickStatementsBot falls into the latter category.

GerardM (talkcontribs)

I am not the author. I use this tool for my purposes it is vitally important. While your arguments are reasonable to an author, they would force the users of the tool to abandon the tool. THAT is not reasonable.

Bovlb (talkcontribs)

If I understand what you're saying correctly, you are taking the position that if the SourceMD tool were to record the user responsible for each change, then you would have to stop using it. In other words, you can only use the SourceMD tool on condition of anonymity.

This is a startling claim. Could you explain your reasons?

GerardM (talkcontribs)

No, I am more than happy for the tool to register me as the user. What I am not happy to do is refrain from using the tool.

Bovlb (talkcontribs)
Bovlb (talkcontribs)
Bovlb (talkcontribs)
Bovlb (talkcontribs)
Reply to "Bad QuickStatements edits with no attribution"