Wikidata:Requests for comment/Handling of stored IDs after they've been deleted or redirected in the external database

From Wikidata
Jump to navigation Jump to search
An editor has requested the community to provide input on "Handling of stored IDs after they've been deleted or redirected in the external database" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.

If you have an opinion regarding this issue, feel free to comment below. Thank you!

Sometimes external databases/services erroneously create more then one entry for the same topic. And Wikidata users searching and adding the IDs routinely add all the IDs they find to the item, since they are all valid IDs. Then later, when the database/website gets around to fixing this redundancy, the entries/IDs are either turned into redirects or outright deleted.

The question is: what should Wikidata do with stored external IDs that have been turned into redirects or deleted? Keep them stored or simply delete them?

Over the years there have been various discussions and disputes between users over the handling of such cases. Unfortunately, official/help pages don't offer clear cut instructions for this case, which is why I'm starting this RFC to find a community concensus. Some users (including me) think that these IDs still provide value and should be kept. The outdated IDs get marked as deprecated plus additional qualifiers like reason for deprecated rank (P2241)redirect (Q45403344) or reason for deprecated rank (P2241)link rot (Q55396945). That way, they can still be found via queries, but are no longer flagged as single value constraint violations. Other users have insisted on removing such old IDs, as they no longer consider them valid IDs.

Among the problems that arose in the past were

  • a) since most properties for external IDs have a single-value constraint (Q19474404), two or more IDs are automatically flagged as constraint violations and get listed on the constraints violations report page. And while marking outdated IDs as deprecated removes the constraint violation flags on the item page, the bots updating the constraint violations pages unfortunately fail to recognize the deprecated status and treat them like values with normal rank. Which results in deprecated values erroneously being listed as violations in the reports- prompting users trying to fix errors to simply delete these deprecated IDs in order to empty the error list.
  • b) some bots have been programmed to check whether values of some external ID properties are redirects and then automacally replace these IDs with the IDs they are redirected to. Which means that instead of the current plus old values, there are suddenly several identical values stored for the same property. Which in turn automatically triggers other bots that remove redundant ID values. So in the end, older IDs that have been turned into redirects are removed without any human input.

Personally, I think deleting these IDs instead of simply marking them as outdated doesn't really offer us any advantages. They were valid IDs at one point and we usually don't delete outdated data, we use ranks to indicate the current/best value. Just like we don't delete properties for external IDs when the external source is no longer around or changes to a different ID - we keep them for historical purposes and because there's still value in keeping them for our users. Our users might have datasets which still ccontain such IDs (including outdated ones) and can still use Wikidata to identify the subject of the deleted ID or even query the respective new IDs. Since Wikidata's importance and value as a central Linked Data Hub has grown immensely over the years, external IDs stored here aren't used just to create links on Wikimedia projects. Many third-party databases/services/websites or projects use such IDs and so they can be used (just as intended) for reconciliation between Wikidata and these external datasets. So if a dataset contains external IDs like Library of Congress authority ID (P244) or IMDb ID (P345) - be it current or outdated IDs - these datasets can be easily matched to the corresponding Wikidata item. But if IDs are stored here and considered valid IDs one day, but get deleted as outdated the next day, that doesn't speak well for Wikidata's reliability. --Kam Solusar (talk) 16:41, 6 June 2020 (UTC)[reply]

General discussion[edit]

  •  Keep them, please.
    The main purpose of identifiers is *identification*, which works perfectly with redirecting URLs and even dead URLs. We do much better if we do not remove such identifiers, but use ranks instead to control visibility of multiple identifiers for the same property. Keeping them would be beneficial for external users, query federation, and in fact for us as well since it is a preventive measure against the redirecting identifier being re-added again after removal. (On a side note, I would prefer to use "preferred rank" for the current identifier and "normal rank" for the redirecting/dead identifier, in contrast to the recommendation above and current practice; the redirecting/dead identifer used to be correct, thus it does not qualify for deprecation in my opinion).
    We even have the single-best-value constraint (Q52060874) as a variant of the single-value constraint (Q19474404) for such cases, but unfortunately it never really took off. Main problem is that the constraint violation reports are updated by a bot framework that has at best scarce support for rankings, and thus makes poor reports for every constraint type that involves rankings. For that reason, perfectly fine data situations are often impossible to remove from the covi report, and we have made some poor strategic decisions involving rankings and management of problematic content in the past. —MisterSynergy (talk) 22:14, 7 June 2020 (UTC)[reply]
    MisterSynergy, maybe as a pilot project WD could test how to best do it for GND ID (P227) on P31=Q5. Each redirect should be clearly related to the target ID, the item is not enough if WD has one item for multiple GND items. MrProperLawAndOrder (talk) 22:40, 8 June 2020 (UTC)[reply]
  •  Keep Additional deprecated values are essentially harmless (unless there are thousands of them on one item, in which case I can see wanting to delete them). However, the bot that generates the static constraint reports should be fixed to ignore deprecated values! How do we get that to happen? ArthurPSmith (talk) 15:53, 8 June 2020 (UTC)[reply]
  •  Keep --Adam Harangozó (talk) 16:41, 8 June 2020 (UTC)[reply]
  •  Keep, Wikidata is independent on these external sets of data and it might be wiser to create a special statement for such instances than to outright delete them. There is no additional harm in storing them. -- Donald Trung/徵國單  (討論 🀄) (方孔錢 💴) 17:16, 8 June 2020 (UTC)[reply]
  •  Keep Tfmorris1 (talk) 19:53, 8 June 2020 (UTC)[reply]
Vladimir Alexiev (talk) 11:59, 13 March 2017 (UTC) Jonathan Groß (talk) 17:52, 26 March 2017 (UTC) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits Jneubert (talk) 13:47, 29 April 2017 (UTC) Sic19 (talk) 20:42, 12 July 2017 (UTC) Wikidelo (talk) 21:15, 8 May 2018 (UTC) ArthurPSmith (talk) 19:52, 22 August 2018 (UTC) PKM (talk) 19:40, 23 August 2018 (UTC) Ettorerizza (talk) 06:44, 8 October 2018 (UTC) Fuzheado (talk) 03:47, 19 December 2018 (UTC) Daniel Mietchen (talk) 16:30, 7 April 2019 (UTC) Iwan.Aucamp (talk) 21:48, 3 October 2019 (UTC) Epìdosis (talk) 23:49, 22 November 2019 (UTC) Sotho Tal Ker (talk) 00:52, 1 May 2020 (UTC) Bargioni (talk) 09:48, 02 May 2020 (UTC) Carlobia (talk) 14:34, 11 May 2020 (UTC) Pablo Busatto (talk) 03:22, 23 June 2020 (UTC) Matlin (talk) 10:53, 6 July 2020 (UTC) Msuicat (talk) 21:57, 27 August 2020 (UTC) Uomovariabile (talk) 10:04, 27 October 2020 (UTC) Silva Selva (talk) 17:21, 30 November 2020 (UTC) 1-Byte (talk) 15:52, 14 December 2020 (UTC) Alessandra.Moi (talk) 17:26, 16 February 2021 (UTC) CamelCaseNick (talk) 21:20, 20 February 2021 (UTC) Songceci (talk) 18:45, 24 February 2021 (UTC)]] moz (talk) 10:48, 8 March 2021 (UTC) AhavaCohen (talk) 14:41, 11 March 2021 (UTC) Kolja21 (talk) 17:37, 13 March 2021 (UTC) RShigapov (talk) 14:34, 19 September 2021 (UTC) Jason.nlw (talk) 15:15, 30 September 2021 (UTC) MasterRus21thCentury (talk) 20:22, 18 October 2021 (UTC) Newt713 (talk) 08:42, 13 March 2022 (UTC) Pierre Tribhou (talk) 08:00, 20 March 2022 (UTC) Powerek38 (talk) 17:21, 14 April 2022 (UTC) Ahatd (talk) 08:34, 4 August 2022 (UTC) JordanTimothyJames (talk) 00:54, 31 August 2022 (UTC) --Silviafanti (talk) 17:07, 14 September 2022 (UTC) Back ache (talk) 02:03, 1 November 2022 (UTC) AfricanLibrarian (talk) M.roszkowski (talk) 10:44, 4 January 2023 (UTC) Rhagfyr (talk) 19:36, 9 January 2023 (UTC) — Haseeb (talk) 13:10, 4 August 2023 (UTC) 13:26, 15 November 2023 (UTC) MrBenjo (talk) 15:20, 23 April 2024 (UTC)[reply]

Notified participants of WikiProject Authority control

  • Firstly thanks to @Kam Solusar: this very needed RfC on a very complex topic like this. Just a few days ago I have opened two discussions on specific properties, this about VIAF ID (P214) and that about GND ID (P227), but probably a more general discussion is still better to find a balanced solution.
    So, first of all I would make a distinction between:
    1. aggregators: identifiers which contain no native information, but only aggregate other identifiers (notable example: VIAF ID (P214))
    2. sources: identifiers which contain native information (the great majority, e.g. GND ID (P227), Library of Congress authority ID (P244), IMDb ID (P345) etc.)
    I think these two categories of IDs deserve different treatments. As said by the Kam, we keep properties of external IDs which have become obsolete "for historical purposes and because there's still value in keeping them for our users". So, let's reason about why external users can use IDs: if an external user uses an aggregator, he uses it because it aggregates different sources; if an external user uses a source, he uses it for the information it contains in itself. The same is also true for Wikidata: we use aggregators because they aggregate different sources, we use sources because of their intrinsic value. Let's go further: if Wikidata contains not only an aggregator, but also all the sources aggregated by it (which is the case of VIAF ID (P214) and VIAF members), the external source can use Wikidata itself as aggregator, with no more need of the previous aggregator; so it becomes evident that keeping redirected and deleted entries of the aggregator is useless. Other aspect, valid at least for VIAF ID (P214): most of the duplicate IDs which eventually get merged are kind of minimal (see e.g. the 11 VIAF ID (P214) in Alaric I (Q102371); items having more than 10 VIAFs are not an exception as might seem, unfortunately), usually containing only one source, and they have basically no external use outside Wikidata, so keeping all them (after redirection) would make the item inconveniently heavier and less readable, with no real benefit for third parties. In conclusion, I strongly support  Delete aggregators redirected/obsolete IDs when the sources aggregated by them are external IDs stored on Wikidata.
    Now, let's come to the great majority of external IDs, those I called sources. Here the problem is more complex, so pros for each solution should be evaluated. However, I should make a consideration first (point 0): I would consider inconsistent for Wikidata just having some redirects and not others; in other terms, if we choose to keep redirected/deleted IDs (obviously marking them as deprecated and qualifying them with reason for deprecated rank (P2241)), it would also be more consistent also to program, whenever possible, some systematic addition of redirected IDs to our items - if we choose completeness, why should we have only IDs which become obsolete after having been added to Wikidata? Also IDs which have become obsolete before being added to Wikidata would deserve addition, and there would not be any clear reason for reverting users adding obsolete IDs to items. If we consider how big is the number of redirects resolved in the history of big databases such as GND ID (P227), Library of Congress authority ID (P244) and IMDb ID (P345), the quantity of obsolete information to be stored becomes relevant (see the above comment by @ArthurPSmith:: "unless there are thousands of them on one item, in which case I can see wanting to delete them"). Now I try to list some pros and cons, with objections:
    1. Pros for keeping obsolete IDs (I quote the good overview by @Kam Solusar:):
      1. "Our users might have datasets which still contain such IDs (including outdated ones) and can still use Wikidata to identify the subject of the deleted ID or even query the respective new IDs"
        1. objections: true, third-party sources can confront their IDs with Wikidata IDs, see that some IDs are deprecated on Wikidata and in that way delete them; however, an alternative way of operating is possible, if Wikidata chooses to delete obsolete IDs: third-party sources confront their IDs with Wikidata IDs, see that some IDs are absent on Wikidata, so they go checking by themselves if the IDs are still valid and they easily see that they have become obsolete - same result, only little more trouble, or maybe less trouble (quoting @Sotho Tal Ker: regarding GND ID (P227): "If a clean up of their data is needed, I would advise those third parties to use the primary database directly, not any secondary one which usually lags behind a bit")
      2. "Many third-party databases/services/websites or projects use such IDs and so they can be used (just as intended) for reconciliation between Wikidata and these external datasets"
        1. objections: this is the best point in my opinion; nevertheless, if Wikidata deleted obsolete IDs reconciliation becomes (only in the relatively few cases where these IDs where used by third-party sources) only more difficult, but not totally impossible
      3. "But if IDs are stored here and considered valid IDs one day, but get deleted as outdated the next day, that doesn't speak well for Wikidata's reliability"
        1. objections: storing obsolete IDs can make the update of these IDs easier for third-party sources using Wikidata, but in my opinion reliability means mainly storing correctly actual IDs
      4. "it is a preventive measure against the redirecting identifier being re-added again after removal" (@MisterSynergy:)
        1. objections: good point, but such additions can easily be found through constraint-violations and/or by bots
    2. Pros for removing obsolete IDs (I quote some good ideas by @Sotho Tal Ker: about GND ID (P227)):
      1. "There is no use in keeping distinct values if they actually point to the same source" (cfr. when a Wikipedia page is moved, the information of its previous name isn't kept) + "Keeping redirects will waste computing power as these redirects have then to be resolved externally"
        1. objections: basically true, they are redundant, but they can still be useful for third-party sources (see above 1.1 and 1.2)
      2. "A cleaner database": obsolete IDs occupy space (not so few, considering point 0 above) and make pages longer
        1. objections: the space occupied is maybe not so much (here some estimate on some properties would be useful to have a more precise idea of the dimensions involved)
    Given all the previous points, weighing pros and cons, I tend to support  deletion also for sources redirected/obsolete IDs. --Epìdosis 21:36, 8 June 2020 (UTC)[reply]
    @Epìdosis: do you know about the massive re-use of GND IDs, see e.g. de:Wikipedia:BEACON? Do you know that Deutsche Biography also stores the old GND IDs - to allow longtime linking by old values? If WD doesn't do that, it is very bad for users linking to WD by GND ID. MrProperLawAndOrder (talk) 22:32, 8 June 2020 (UTC)[reply]
  •  Keep for widely re-used source IDs such as GND ID and ISNI. No opinion on VIAF yet. No opinion on minor third party IDs yet. MrProperLawAndOrder (talk) 22:09, 8 June 2020 (UTC)[reply]
    • @MrProperLawAndOrder: I think that a distinction between widely reused sources (ISNI (P213) and GND ID (P227)) and less widely (= "minor") sources can be useful, although we should reach a clear agreement about the definition of "widely reused".
    • de:Wikipedia:BEACON is very interesting, of course. Possible solution: we can keep obsolete IDs only for a given time after obsolescence (e.g. 2 or 5 or 10 years, to be decided) in order to allow third parties to substitute them with valid ones.
    What do you think about these two points? --Epìdosis 22:54, 8 June 2020 (UTC)[reply]
    @Epìdosis: 10 years in internet, that is good :-) Yes, a definition is needed. And we should find out how to manage the data. I suggest starting with humans and with GND, because there the redirects can be downloaded from the source and GND IDs are widely re-used. The beacon files could be analyzed to find out if the sources substitute or not. MrProperLawAndOrder (talk) 23:23, 8 June 2020 (UTC)[reply]
  • Just for completeness, I report here the opinion which emerged in the discussions on VIAF ID (P214) and on GND ID (P227):
    Now I would like to ask two questions to the users (@ArthurPSmith:, @Adam Harangozó:, @Donald Trung:, @Tfmorris1:) which voted "keep" before my comment, in order to have a clearer understanding of their position:
    1. given the distinction I've made between aggregators and sources, would you keep obsolete IDs of both types or just of one type?
    2. would you support only "already-present obsolete IDs should be deprecated instead of deleted" or also "already-present obsolete IDs should be deprecated instead of deleted and it is fine to add ex novo other deprecated obsolete IDs"?
    @MisterSynergy: has already answered both questions, stating that he would like to keep all obsolete IDs and also would support the addition ex novo of other obsolete IDs. Thank you very much, --Epìdosis 09:40, 9 June 2020 (UTC)[reply]
    @Epìdosis: 1. keep both, I don't see such a huge distinction, though if a source is routinely churning through its identifiers we might want to question using it as an external id at all here; On 2. No, do not add "new" obsolete IDs, my opinion is just that when an ID is added in Wikidata it should persist, at least for a while (in 10 years we can look at it again of course!) ArthurPSmith (talk) 13:52, 9 June 2020 (UTC)[reply]
    Keep both. The valuable metadata that VIAF adds is the equivalency of IDs, as denoted by their "clusters." No, don't add newly discovered obsolete IDs, that's both extra (wasted) work and higher risk of bad reconciliation. My normal practice is to deprecate the old identifier and add the new identifier as preferred. Tfmorris1 (talk) 17:22, 9 June 2020 (UTC)[reply]
  • Keep them all, marking as deprecated where applicable. I'm getting tired of being asked this same question over and over again, in different venues. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:52, 9 June 2020 (UTC)[reply]
    @Pigsonthewing: This should be the last, hopefully :) Just a precisation: would you support only "already-present obsolete IDs should be deprecated instead of deleted" or also "already-present obsolete IDs should be deprecated instead of deleted and it is fine to add ex novo other deprecated obsolete IDs"? Thanks, --Epìdosis 22:59, 9 June 2020 (UTC)[reply]
    But it isn't, is it. Having found virtually no support here, you've posted, immediately below and within 24 hours, another lengthy reframing of the question. And pinged everyone again. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:30, 10 June 2020 (UTC)[reply]
    It's not a problem of support, as you can see from the proposals I've made, which differ very much from my above post. The problem regard the aspects about which most users (you included) have not yet expressed their opinion, mainly the following question: "already-present obsolete IDs should be deprecated instead of deleted" or also "already-present obsolete IDs should be deprecated instead of deleted and it is fine to add ex novo other deprecated obsolete IDs"? And also other minor problems, which I have tried to list in the operative proposals below. A generic "keep" cannot be a conclusion of such a RfC, since it doesn't deal with many important operative issues. Thanks, --Epìdosis 17:38, 10 June 2020 (UTC)[reply]
  •  Keep Deprecated values are a valuable information for our users. If they look up for an (old) identifier, that ia a redirect now they know now what item it belongs to and can look up for the current identifier. An authority control hub without such information is incomplete. Raymond (talk) 08:53, 10 June 2020 (UTC)[reply]
  •  Keep - mark as depreciated and add reason for deprecated rank (P2241) qualifier. It can be useful to track changes to identifiers and this is difficult when the information disappears. Simon Cobb (User:Sic19 ; talk page) 17:16, 10 June 2020 (UTC)[reply]
  • See Wikidata_talk:Identifiers#Deprecate_or_remove. There are currently a few separate discussions about specific identifiers and their quirks. --- Jura 13:02, 11 June 2020 (UTC)[reply]
    @Jura1: Thanks for remembering this discussion, I honestly didn't remember its existence. Fortunately this should be the general discussion that finally sets clear guidelines about this problematic. --Epìdosis 15:20, 11 June 2020 (UTC)[reply]
  •  Keep External databases may use another external database ID as key to find wikidata entry. So external ID as part of items identification. Keeping historical records will allow to match links made by external outdated documents. --Framawiki (please notify !) (talk) Sorry for my bad English :) 21:05, 11 June 2020 (UTC)[reply]
  • I am still in favor of removal. Not that I care too strongly, the arguments just do not convince me. If we want redirects to be stored, then ANY redirect should be added, even those which are not in Wikidata yet. This "only keep what we have but do not add new redirects" does not work well with the reasons stated by those who want to keep redirects. If redirects are kept, property constraints should be adapted aswell. --Sotho Tal Ker (talk) 21:09, 12 June 2020 (UTC)[reply]
  •  Keep outdated identifiers. mark current ones are preferred. BrokenSegue (talk) 17:44, 13 February 2022 (UTC)[reply]
  •  Keep I also have a clear vote to keep identifiers that are no longer up-to-date. I am clearly against any deletion of identifiers, especially if they have already been maintained and set to a deprecated rank. --Gymnicus (talk) 09:11, 17 March 2022 (UTC)[reply]
  •  Keep I entirely agree that these should be kept – and also added post factum where appropriate – as deprecated statements.
    There's an argument that redirects are still valid identifiers and therefore shouldn't be deprecated (but rather the most correct value marked as preferred), but I don't have a particularly strong opinion about that. It would require some other changes since reason for deprecated rank (P2241) wouldn't apply to those redirect statements then, so it might not be worth the subtle distinction.
    I also support keeping values which are invalid (e.g. deleted) and adding them when they are already dead at the time of insertion into Wikidata. They may be recorded elsewhere, in which case the association with the corresponding Wikidata item is still useful. I do think post factum additions require a good source though, such as an old export of the external DB, a Wayback Machine snapshot of the DB's website, or similar – not just a single mention somewhere else on the web, which may easily be erroneous. –JustAnotherArchivist (talk) 18:45, 17 August 2022 (UTC)[reply]
  •  Keep I bet somebody already said that, but if, here is my reiteration: keeping depricated ids prevents tools like Wikidata for Web (Q99894727) from exidentally re-add them –Shisma (talk) 17:44, 29 December 2023 (UTC)[reply]

slug redirect IDs[edit]

I generally support the conclusions to keep the stable redirect IDs that have been reached here, but it seems to me that the exact opposite approach should be taken to slug IDs. These are the IDs that are generated based on an item's name and that changes every time the item name is changed. Over time, a dozen redirect IDs can be generated for a single actual ID. The situation is complicated by the fact that some of the older redirects to one ID later become actual working IDs for other subjects, ceasing to be redirects.

For example, P6127 P5794 properties have this type of ID now (as well as other properties related to these sites). With Letterboxed we currently have a practice to keep older IDs and put the reason for "deprecated reason: redirect", but IGDB on the contrary has a bot that keeps track of these IDs and keeps them up to date, updating outdated ones with current ones and not leaving any deprecated ones. (Letterboxed also could be handled by a bot the same, and I proposed the algorithm to the creator of the bot that handles these IDs User talk:Carlinmack#Suggestion for AddLetterboxdFilmIdBot, but unfortunately they did not respond).

With Letterboxed, I periodically do complete validity checks of the IDs, and on average each month 30 outdated and stored on WD redirect IDs become valid working IDs on the next check. Maintaining endless redirects to an up-to-date ID in such cases is effectively useless and only adds to the difficulty of tracking them to avoid duplicates. Here I suggested to stop storing them for this particular property: Property talk:P6127#Keeping deprecated redirect IDs?, leaving only the actual ones for every Qitem. And it seems to me that Wikidata should have the practice of removal of endless slug redirect IDs. Keeping all title variants as separate deprecated IDs and making sure they don't become working IDs for other entities is a practice that adds more problems rather than benefits.

Notice I was only talking about redirects IDs, and not withdrawn identifier value (Q21441764) or other types. The former are generated on a constant basis in huge numbers, while the latter most often means a break in the generation chain. Therefore, withdrawn IDs must be maintained and processed in the usual way. Their maintenance and processing does not create such a problem as with actual endless redirect variants of the actual ID.

P.S. Perhaps for slug ID we could also make a separate type of Wikidata property change frequency (Q23611439) for stability of property value (P2668) which will specify that storing redirect IDs is not needed. Solidest (talk) 09:45, 11 January 2024 (UTC)[reply]

Operative proposals[edit]

Please comment below, choosing the operative proposal you judge more appropriate and evidencing eventual points which in your opinion need further debate. While it is already clear that generic keep is widely considered the best option, other aspects still need to be clarified (e.g. the possibility of inserting obsolete IDs ex novo, on which only few users have expressed any opinion). Thanks, --Epìdosis 17:27, 10 June 2020 (UTC)[reply]

First operative proposal[edit]

So, I’ve continued reflecting upon this complex theme, in order to elaborate a first (obviously perfectible) operative proposal, which I divide into numbered points in order to make it easier to criticize specific parts of it. I want to thank especially @MisterSynergy: and @MrProperLawAndOrder: for their critics and @Bargioni: for his oral suggestions, which I think helped me in forming a more balanced overview of the problematic, substantially changing my idea of the importance of keeping these IDs, at least for certain time, in order to allow external databases still to use them.

The main objective of this proposal, given that there seems to be strong consensus about keeping obsolete external IDs in general, is now to raise attention regarding three points which still don’t always emerge in the opinions expressed above:

So, here is my proposal.

  1. when an external database has an obsolete (redirected or withdrawn) ID, you should act in one of the following two ways:
    1. case 1: the external database is instance of (P31)Wikidata property widely reused by third-party entities (Q96192295), with at least five references which demonstrate the effective wideness of its external use (e.g. VIAF ID (P214), GND ID (P227)) – since the reason for which we keep obsolete external IDs is to offer the possibility to external databases of fixing them, we should make sure there are some external databases interested to the IDs we keep
      1. if an ID is still valid, it should have normal rank and should possibly be placed in first place
      2. if an ID already present has been redirected, it should have deprecated rank + the qualifier reason for deprecated rank (P2241)redirect (Q45403344)
      3. if an ID already present has been withdrawn, it should have deprecated rank + the qualifier reason for deprecated rank (P2241)withdrawn identifier value (Q21441764)
      4. if an ID has been redirected or withdrawn before being added to the item, it should not be added to the item directly as deprecated
        1. an exception is made for redirected or withdrawn IDs which have been removed in the past: they can be reinserted directly as deprecated, because they were once present in the item
    2. case 2: the external database isn’t instance of (P31) 96192295 or has fewer than five references which support this statement
      1. if an ID is still valid, it should have normal rank
      2. if an ID has been redirected or withdrawn, it should simply be deleted
  2. technical notes:
    1. the bots updating constraint violations lists should be instructed about ignoring deprecated values
    2. the bots updating items should mark as redirected or withdrawn the IDs in the way mentioned above; if an ID has been redirected, they should also add the valid ID using normal rank and possibly placing it in first place
    3. the following option should be added to Special:Preferences and possibly enabled by default: in the section external-IDs, all deprecated values should be collapsed, with a button on each property allowing to show them – the reasons are the following: most users aren't interested in viewing obsolete IDs, which make long pages still longer and may confound the readers; additionally, collapsing these IDs reduces the risk that newbies, not understanding their usefulness, remove them in good faith or accidentally
  3. final note: in 2025 the choice between continuing keeping the IDs deprecated in 2020 or before and deleting them will be subject of a new RfC

Ready to receive your comments! Bye, --Epìdosis 15:19, 10 June 2020 (UTC) [reply]

Vladimir Alexiev (talk) 11:59, 13 March 2017 (UTC) Jonathan Groß (talk) 17:52, 26 March 2017 (UTC) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits Jneubert (talk) 13:47, 29 April 2017 (UTC) Sic19 (talk) 20:42, 12 July 2017 (UTC) Wikidelo (talk) 21:15, 8 May 2018 (UTC) ArthurPSmith (talk) 19:52, 22 August 2018 (UTC) PKM (talk) 19:40, 23 August 2018 (UTC) Ettorerizza (talk) 06:44, 8 October 2018 (UTC) Fuzheado (talk) 03:47, 19 December 2018 (UTC) Daniel Mietchen (talk) 16:30, 7 April 2019 (UTC) Iwan.Aucamp (talk) 21:48, 3 October 2019 (UTC) Epìdosis (talk) 23:49, 22 November 2019 (UTC) Sotho Tal Ker (talk) 00:52, 1 May 2020 (UTC) Bargioni (talk) 09:48, 02 May 2020 (UTC) Carlobia (talk) 14:34, 11 May 2020 (UTC) Pablo Busatto (talk) 03:22, 23 June 2020 (UTC) Matlin (talk) 10:53, 6 July 2020 (UTC) Msuicat (talk) 21:57, 27 August 2020 (UTC) Uomovariabile (talk) 10:04, 27 October 2020 (UTC) Silva Selva (talk) 17:21, 30 November 2020 (UTC) 1-Byte (talk) 15:52, 14 December 2020 (UTC) Alessandra.Moi (talk) 17:26, 16 February 2021 (UTC) CamelCaseNick (talk) 21:20, 20 February 2021 (UTC) Songceci (talk) 18:45, 24 February 2021 (UTC)]] moz (talk) 10:48, 8 March 2021 (UTC) AhavaCohen (talk) 14:41, 11 March 2021 (UTC) Kolja21 (talk) 17:37, 13 March 2021 (UTC) RShigapov (talk) 14:34, 19 September 2021 (UTC) Jason.nlw (talk) 15:15, 30 September 2021 (UTC) MasterRus21thCentury (talk) 20:22, 18 October 2021 (UTC) Newt713 (talk) 08:42, 13 March 2022 (UTC) Pierre Tribhou (talk) 08:00, 20 March 2022 (UTC) Powerek38 (talk) 17:21, 14 April 2022 (UTC) Ahatd (talk) 08:34, 4 August 2022 (UTC) JordanTimothyJames (talk) 00:54, 31 August 2022 (UTC) --Silviafanti (talk) 17:07, 14 September 2022 (UTC) Back ache (talk) 02:03, 1 November 2022 (UTC) AfricanLibrarian (talk) M.roszkowski (talk) 10:44, 4 January 2023 (UTC) Rhagfyr (talk) 19:36, 9 January 2023 (UTC) — Haseeb (talk) 13:10, 4 August 2023 (UTC) 13:26, 15 November 2023 (UTC) MrBenjo (talk) 15:20, 23 April 2024 (UTC)[reply]

Notified participants of WikiProject Authority control --Epìdosis 16:46, 10 June 2020 (UTC) [reply]

Notified participants of WikiProject Biographical Identifiers --Epìdosis 16:48, 10 June 2020 (UTC) [reply]

JakobVoss (talk) ClaudiaMuellerBirn (talk) Criscod (talk) Daniel Mietchen (talk) Ettorerizza (talk) Ls1g (talk) Pasleim (talk) Hjfocs (talk) 17:24, 21 January 2019 (UTC) PKM (talk) 2le2im-bdc (talk) 20:30, 24 January 2019 (UTC) Vladimir Alexiev (talk) 16:37, 21 March 2019 (UTC) ElanHR (talk) User:Epìdosis (talk) Tris T7 TT me UJung (talk) 11:43, 24 August 2019 (UTC) Envlh (talk) SixTwoEight (talk) User:SCIdude (talk) Will (Wiki Ed) (talk) Mathieu Kappler (talk) So9q (talk) 19:33, 8 September 2021 (UTC) Zwolfz (talk) عُثمان (talk) 16:31, 5 April 2023 (UTC) M2k~dewiki (talk) 12:28, 24 September 2023 (UTC) —Ismael Olea (talk) 18:18, 2 December 2023 (UTC) Andrea Westerinen (talk) 23:33, 2 December 2023 (UTC) Peter Patel-Schneider[reply]

Notified participants of WikiProject Data Quality --Epìdosis 16:49, 10 June 2020 (UTC) [reply]

FYI: I have suggested to Europeana that we create an staging area using en:Wikibase outside Wikidata to better track new, deleted, changed external properties both in Wikidata but also from the external source. Lesson learned is that most external sources areweak regarding version history, change management, talk pages, ping functions, subscriptions...
  • see
    • T251225#6088169 "Change management of entities created and deleted in Europeana"
    • T240738 "More than 1200 Europeana Entities reference deleted Wikidata objects - task identify them but also Europeana need action"
- Salgo60 (talk) 07:04, 13 June 2020 (UTC)[reply]

Discussion OP1[edit]

 Support 100%, since I contributed to this proposal. VIAF abandoned the production of the persist file (list of redirects): see http://viaf.org/viaf/data page. If this proposal will be accepted, WD could help catalogs to stay in sync with VIAF. --  Bargioni 🗣 15:40, 10 June 2020 (UTC)[reply]
@Bargioni: if the redirects are not provided there, how to obtain the data? MrProperLawAndOrder (talk) 03:28, 11 June 2020 (UTC)[reply]
@MrProperLawAndOrder: VIAF redirects are only available through the use of the link itself. The browser switches to the correct cluster, while a robot will receive the proper info. Something similar occurs for abandoned IDs. --  Bargioni 🗣 10:47, 11 June 2020 (UTC)[reply]
@Bargioni: that means harder to obtain, than e.g. those GND redirects that are in the LDS file / GND dump. MrProperLawAndOrder (talk) 15:26, 11 June 2020 (UTC)[reply]
@MrProperLawAndOrder: No problem for a bot that scans P214 values. For batches like I did up to some months ago, it is impossible starting from the day that VIAF removed the persist file from http://viaf.org/viaf/data. Absurd... --  Bargioni 🗣 15:56, 11 June 2020 (UTC)[reply]

Second operative proposal[edit]

As the first, but simplified removing the part related to instance of (P31)Wikidata property widely reused by third-party entities (Q96192295), which may be thought as potentially confusing. --Epìdosis 17:21, 10 June 2020 (UTC)[reply]

  1. when an external database has an obsolete (redirected or withdrawn) ID, you should act in the following two ways:
    1. if an ID is still valid, it should have normal rank and should possibly be placed in first place
    2. if an ID already present has been redirected, it should have deprecated rank + the qualifier reason for deprecated rank (P2241)redirect (Q45403344)
    3. if an ID already present has been withdrawn, it should have deprecated rank + the qualifier reason for deprecated rank (P2241)withdrawn identifier value (Q21441764)
    4. if an ID has been redirected or withdrawn before being added to the item, it should not be added to the item directly as deprecated
      1. an exception is made for redirected or withdrawn IDs which have been removed in the past: they can be reinserted directly as deprecated, because they were once present in the item
  2. technical notes:
    1. the bots updating constraint violations lists should be instructed about ignoring deprecated values
    2. the bots updating items should mark as redirected or withdrawn the IDs in the way mentioned above; if an ID has been redirected, they should also add the valid ID using normal rank and possibly placing it in first place
    3. the following option should be added to Special:Preferences and possibly enabled by default: in the section external-IDs, all deprecated values should be collapsed, with a button on each property allowing to show them – the reasons are the following: most users aren't interested in viewing obsolete IDs, which make long pages still longer and may confound the readers; additionally, collapsing these IDs reduces the risk that newbies, not understanding their usefulness, remove them in good faith or accidentally
  3. final note: in 2025 the choice between continuing keeping the IDs deprecated in 2020 or before and deleting them will be subject of a new RfC

Discussion OP2[edit]

  • strike 2025 - anyone can make a request in 2025 anyway, no use in writing this down. Or will this mean, if a proposal doesn't include this, it cannot be reviewed? MrProperLawAndOrder (talk) 03:32, 11 June 2020 (UTC)[reply]
    I thought fixing the date of a new discussion in order to make the point of the situation could be useful; anyway, yes, whoever can just open a discussion in 2025 or in another date (every decision can be reviewed, of course), so striked. --Epìdosis 10:52, 11 June 2020 (UTC)[reply]
  • re 1.4 "if an ID has been redirected or withdrawn before being added to the item, it should not be added to the item directly as deprecated" - No. As user:MisterSynergy mentioned on your talk page, redirects are just other names. And in general: Permission to have data in WD should not depend on historic presence of data in WD. MrProperLawAndOrder (talk) 03:40, 11 June 2020 (UTC)[reply]
  • I am working much on data quality in Wikidata. I think discussed question is very important for Wikidata future.
  • "the bots updating constraint violations lists should be instructed about ignoring deprecated values" - let assume that some item was vandalized and some deprecated value was replaced to obscene vocabulary. Should constraint violations list ignore this?
  • We may think that deprecated values do not require any validation. Lets take a look to this edit: [1]. Valid and actual code was added with deprecated rank. Should we close all our eyes to this?
  • We have tons of completely wrong values in Wikidata. Just open constrains report of any wide used property. Misprints, bugs in import procedures, vandalized values, errors in external databases... Its are not fixed during years... Not fixed because we have no enough resources to keep the whole existing data in valid state. This suggestion adds much more work. Now we need working on processing and validation of deprecated values also. Who will do this? Why we think that we have enough resources to work on this also?
  • How to validate deprecated values? Redirects is not a big problem. But how to deal with deleted values? Some databases keep deleted values, but another - no. archive.org will help in few cases only. So deprecated values will not validated actually. External users will think that Wikidata is collection of invalidated garbage. Such values will interrupt Wikidata:Verifiability.
  • Redirects. Some databases have ability to list all redirects. So we can import all redirects. But another databases have no public API for this. So we will have only some of redirects. E. g. item will have incomplete information. Incomplete information = wrong information from some points of view.
  • Wide used deprecated values require much more complex algorithms for work. For example lets think about our merge tool. How it should merge this and this items? Both have PubMed=5943422 value. One have this value with normal rank, another with deprecated. Should the tool add both values to result item? Or do nothing and show error about conflict? It is simple example because we can ask user about something. But currently I am working on automatic merge procedures because we have tens of thousands items that should be merged. It is very hard task already due different types of errors and inconsistencies in data. Now I should increase complexity more and more for deprecated values processing, resolving conflicts in its and etc. And this story will repeat again and again in each bot, tool and other instruments. Wikidata is non-commercial project. Wikidata success is depend on entrance level for contributors. New tool creation should be simple process. This defines Wikidata success.
  • Redirects and deleted values are interested for few kinds of tasks only. And we have item history page already - few users who really need historical values may analyze item history. Do we really need make usual tasks more complex to make rare used tasks more simple?
Ivan A. Krestinin (talk) 23:57, 2 October 2021 (UTC)[reply]
  •  Support I'm all for this option because it clearly shows that deleting identifiers is not allowed. Similar to MrProperLawAndOrder and MisterSynergy, I do not see point 1.4 as necessary and would remove it from the option. Also, I would say that point 2.3 is not necessary because we now have the statements colored red and green and it is then clear that this statement is deprecated. It is then no longer necessary to fold them in. --Gymnicus (talk) 08:31, 26 March 2022 (UTC)[reply]