Topic on User talk:Magnus Manske

Jump to navigation Jump to search

More problems with duplicate authors

3
ArthurPSmith (talkcontribs)

Hi - I've just been spending many hours dealing with another few hundred duplicate-authors actions; this tool is *really* not ready for prime time. Latest example: https://www.wikidata.org/w/index.php?title=Q4757048&diff=2133123022&oldid=2118661421 - how can you merge "Andrew G. White" and "Alex White"? There have been far too many similar merges like this that I've had to revert - and I'm concerned that somebody's going to go in again and re-merge them if the tool is recommending this. Also I'm finding many cases of incorrect merges where the issue is likely that the wrong person was assigned to a paper by the other author in Wikidata, so your tool thinks they're the same person because of that. I had a case yesterday where a political scientist with the same name as a biochemist was set as the author of some biochemistry papers, and your tool then was used to merge them. What actually would have been useful in a case like that was to point out that those biochem papers had been likely assigned to the wrong person as author. Can you adjust your tool to do that? In any case until some significant fixes are in place this needs to be turned off ASAP.

ArthurPSmith (talkcontribs)

By the way part of the problem (including the political scientist one) is that ORCID sometimes has these bad author assignments - some of the data in ORCID comes from services like Scopus that can be mistaken on things like this.

ArthurPSmith (talkcontribs)

Another common issue I'm seeing is two different authors with the same family name and same first initial co-authoring a paper. This often happens with husband-wife teams; also other family relationships can result in paper co-authoring, and even just being from the same region of the world area two individuals may be likely to share a surname. Your duplicate authors game will always try to merge these cases, because of course being co-authors on a paper means they are both co-authors with the other authors on that paper. Instead I think it should catch cases like this as a sign that they *are not* the same person. Having both wikidata id's as co-authors on the same work should be an indicator that they are distinct, not the same. I know there are exceptions where real duplicates exist, but at the least it should be handled differently from other cases where they are never co-authors.

Reply to "More problems with duplicate authors"