Wikidata:Requests for permissions/Bot/FischBot 8
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
FischBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Pyfisch (talk • contribs • logs)
Task/s: Remove date of birth, date of death statements sourced only by VIAF.
Code: not public at this point
Function details: The bot works from a list of VIAF records linked to Wikidata items. For each item it removes all date of birth (P569) and date of death (P570) statements with the source stated in (P248): Virtual International Authority File (Q54919). In case the statement has additional sources, only the reference is removed.
I already removed (edits) those dates from VIAF that were marked "flourished". They were wrongly imported as dob/dod. Other dob/dod statements imported from VIAF may be correct, but VIAF is not a suitable source as it discards date information found in other authority files and incorporates information from Wikidata. Some common errors are: missing circa, wrong precision (year instead of century), flourished dates not marked as such and more. In the future the dates should be added directly from the relevant authority control file.
Some examples of dubious or wrong dates from VIAF (query):
- Adaeus (Q346767)
- Giacomo Marzari (Q23823709)
- Stanisław Zaborowski (Q9343414)
- Jean-Charles Ellaume (Q3164651)
@Magnus Manske: you originally added most of these statements. @Jura1, Epìdosis: from prior discussion at Wikidata:Bot requests.
If there is no objection to the removal of these statements I will start the bot on Friday. --Pyfisch (talk) 23:40, 5 October 2020 (UTC)[reply]
- Strong support --Epìdosis 06:19, 6 October 2020 (UTC)[reply]
- Thanks for doing these. Most helpful. From the above query, I checked a few rdfs exports at VIAF. They generally have the date and one or several sources where it could come from. Generally, it can be found on one of them. Sometimes this is LoC or idref (both tertiary sources), but it could also be ISNI or dbpedia, which would probably make VIAF a quintary source. Obviously, others can have the same problem, e.g. a LoC entry has several references without the dates being attributed to one of them. To sum it up: I'd also deprecate (if no other ref is present) or remove these references/statements. --- Jura 10:40, 6 October 2020 (UTC)[reply]
- BTW, when we will import dates from VIAF members, the first ones I would consider are the following: GND ID (P227), Library of Congress authority ID (P244), Bibliothèque nationale de France ID (P268), IdRef ID (P269). --Epìdosis 22:34, 7 October 2020 (UTC)[reply]
- @Magnus Manske: would these be re-imported by some tool? --- Jura 10:40, 6 October 2020 (UTC)[reply]
- I really fear these statements would be (at least in part) reimported by @Reinheitsgebot: from MnM catalog 2050. An option in MnM should be inserted: it should be possible to mark a catalog as not suitable for the automatic addition of references based on it; this option would also be very useful for CERL Thesaurus ID (P1871) (= catalog 1640), which isn't an independent source too, and for other catalogs. --Epìdosis 12:48, 6 October 2020 (UTC)[reply]
- Unfortunately Support As explained above, and as I have seen in items, there are too many bad claims in this import from VIAF. --Shonagon (talk) 16:25, 6 October 2020 (UTC)[reply]
- Comment After some thought, I think it's preferable to keep the statements that were correctly imported from VIAF and only deprecate them when the statements are known to be incorrect. VIAF's approach isn't much different from other tertiary sources mentioned above, i.e. LOC, CERL or GND would be preferable with their secondary source, notably for GND that has become a wiki.
The statements Pyfisch removed in the initial batch were different: there we knew bots had imported them incorrectly into Wikidata. --- Jura 17:28, 8 October 2020 (UTC)[reply]- There is a bunch of dates already labeled "circa" by VIAF, but this qualifier is missing for these dates on Wikidata. In addition dates that are stated as "19.." or "20th century" in the sources VIAF uses are recorded in VIAF as 1950 and imported into Wikidata. This issue equally applies to dates with decade precison. While I can't be sure that the data for people with "date of birth: 1950" in VIAF is wrong, as there are people who were actually born in 1950, it is very likely. --Pyfisch (talk) 09:24, 16 October 2020 (UTC)[reply]
- Comment @Pyfisch, Jura1, Shonagon, Epìdosis: I don't know what was eventually done. I made some graphs, showing the issue with VIAF data. I agree with Jura that the statements should not be removed, but deprecated, so they can be easily tracked. One solution could be that dates of birth and death with year precision (at least 1950) and with only VIAF as reference to be deprecated on Wikidata, with reason for deprecated rank (P2241) set to a new item instance of Wikibase reason for deprecated rank (Q27949697) linking to this discussion. — Envlh (talk) 11:49, 19 May 2021 (UTC)[reply]
- @Envlh, Pyfisch, Jura1, Shonagon: Hi! I agree that the case of people (not) born in 1950 (and, probably in minor scale, also in 1850 and back so) is particularly annoying; among other problems, false 1950 values go very often into Google knowledge graphs and then people struggle to understand why (e.g. https://twitter.com/wikigamaliel/status/1381666020998385664, https://twitter.com/mbennardo/status/1010178293142884352, https://twitter.com/vladsavov/status/1401149918480199681, https://twitter.com/PeterLodewijk/status/1063371876100132864). I tried to arrange a solution today: systematic removal of 2947 date of birth (P569)"1950" being sourced from Virtual International Authority File (Q54919) (found through https://w.wiki/43ad) using QS + QS; addition using QS, in the same 2947 items, of date of birth (P569)20. century sourced from Virtual International Authority File (Q54919) (leaving it blank would risk to allow reimport of wrong 1950). --Epìdosis 16:12, 10 September 2021 (UTC)[reply]