Wikidata talk:Notability

From Wikidata
Jump to navigation Jump to search
This is the talk page for discussing improvements to Notability.
Use the "Add topic" button in the upper righthand corner to begin a new discussion, or reply to one listed below.

For discussion of the "Exclusion criteria" section of this guideline, please see the /Exclusion criteria subpage.

Previous discussion at Wikidata:Project chat[edit]

See Wikidata:Requests_for_comment/Notability.

Category items[edit]

I know there's a discussion about Commons above, although I'm not sure where it's going. Following a discussion on the project chat at Wikidata:Project_chat#Commons_gallery_vs_category_sitelinks I'd like to propose changing the following condition:

In addition, sitelinks on category items to category pages on Wikimedia Commons are allowed if and only if they are linked with category pages on other Wikimedia sites. [with a footnote]

to:

Category items are only valid if they either a) have at least two sitelinks, or b) have one sitelink and are linked via topic's main category (P910) and category's main topic (P301) with an item which is notable in its own right.

The idea is that it's pointless to create category items unless they serve some purpose. The two possible purposes are a) connecting categories on two projects to provide sitelinks b) linking a category to a main item, to potentially provide additional sitelinks and to allow templates on the category to extract data from the main item. Commons categories are often sitelinked to the main item directly, but this can't be done in some cases because a gallery is already sitelinked. The new wording doesn't mention Commons specifically, or include the footnote referring to an old RFC. Ghouston (talk) 06:04, 10 June 2018 (UTC)

I agree with not discriminating against Commons and applying any inclusion/exclusion criteria to all wikis, but I don't think we should restrict category items like that. It would make things unnecessarily complicated and create more maintenance work keeping track of and deleting categories which are no longer notable, re-adding statements when a category becomes notable again, etc. I personally think we should allow more or less all categories. If there are specific types of things we want to exclude, they should be added to Wikidata:Notability/Exclusion criteria (e.g. category redirects are already listed there). - Nikki (talk) 14:18, 27 June 2018 (UTC)
It wouldn't bother me if all category items could be created. There's also an argument from Commons that allowing items to be created for intersection categories (Commons has millions) would be useful, since infoboxes on such categories can display which main items the category is an intersection of. Such linking may also be useful for future structured data on Commons purposes. Admittedly the current wording already permits the creation of category items for categories from all projects except Commons, so the wording I suggested would loosen the requirement on Commons but restrict it on every other project. The question is whether anyone (still) has an objection to creating items for intersection categories from Commons. Ghouston (talk) 01:20, 29 June 2018 (UTC)
Category:Fox Glacier (Q55246571) is an example of an item which violates the current policy, and which in principle could be nominated for deletion, but which is needed for the Commons category sitelink and the infobox on the Commons category. Ghouston (talk) 02:01, 29 June 2018 (UTC)
  • So we will end up getting 4 million new items for Commons categories?
    --- Jura 05:45, 1 July 2018 (UTC)
I undo the edit about Point 1 because this is probably the most important page on Wikidata and if you want change this you must ask a Wikidata:Requests_for_comment. It isn't sufficient a simply topic in this page. The consensus is insufficient. --ValterVB (talk) 11:12, 1 July 2018 (UTC)
@ValterVB: Considering RfCs here seem to last a year each, I'm not sure that's a useful way forward. I guess this could be raised at project chat (again!), if need be... Thanks. Mike Peel (talk) 11:56, 1 July 2018 (UTC)
Personally I haven't a stron g opinion about thsi topic, but surely to change rules on notability is necessary a strong consensus. And we can't have it without a preventive discussion. If it is necessary a lot of time isn't a problem, when we reach consensus we can all work together to apply it --ValterVB (talk) 12:10, 1 July 2018 (UTC)
  • It might be easier to store these things directly on Commons. The countless discussions of the topic haven't convinced people that there is much benefit from these things at Wikidata, especially from the point of view of frequent users of Commons. Given that Commons is changing anyways, I don't think Wikidata is a suitable place for its temporary construction equipment. Maybe it's simply time to archive sitelinks to Commons galleries now. If we add 4 million items only to delete or let them linger afterwards, the strained Wikidata infrastructure is burdened for no common long term benefit.
    --- Jura 12:14, 1 July 2018 (UTC)
  • @Jura1: You seem to be thinking that the 4 million missing commons category links are all duplicates of galleries. That's not true - at most it's 100k, see User:Jheald/commons, but that's ignoring existing links. I've put a list of 1,000 random commons categories without sitelinks at commons:User:Mike Peel/Commons categories without Wikidata sitelink - have a look through that and you'll get a better idea of what the 4 million contain. (Hint: they aren't 4 million intersection categories either). Thanks. Mike Peel (talk) 20:25, 1 July 2018 (UTC)
  • All items that have a Commons categories should have as site link the link to the Commons category, and if a gallery exist then it should be listed as a property. That will solve all the issues, included all the issues relative to potential info-boxes, and also all potential issues of notability here in Wikidata. Christian Ferrer (talk) 12:19, 1 July 2018 (UTC)
    • It's not needed for infoboxes to work there. Notability at Wikidata currently excludes these sitelinks. To clean up Commons, importing the defects here isn't really helpful for Wikidata.
      --- Jura 12:27, 1 July 2018 (UTC)
      • @Jura1: The infoboxes work easiest using the sitelinks, but it's true that you can set manual qids and they'll still work. However, that means messing around with figuring out with qid numbers, rather than text, and that creates an unnecessary barrier for editors creating the links between the two. Also, it means a lot more manual work, as they can't be bot-added without the sitelinks. Plus, maintenance in the longer term is a pain, as you then have to maintain the qid matches on commons, rather than on wikidata - and I suspect that categories won't be going away any time soon... Thanks. Mike Peel (talk) 20:21, 1 July 2018 (UTC)
Even if not needed, of what I say is not false, every Wikidata items enough notable to exist should have as site link to Wikimedia Commons, when they exist, the Commons categories (which is the norm in the meaning that all files are categorized but not all are in galleries). Christian Ferrer (talk) 12:32, 1 July 2018 (UTC)
If what I just said was in effect then nobody will search to create Wikimedia Category items especially/only for Commons Categories, this is, from my point of view, obviously a fact. Christian Ferrer (talk) 12:58, 1 July 2018 (UTC)

There are a number of points I would like to make here:

  • Firstly, a fundamental part of what Wikidata is for is to serve the structured data needs of the whole Wikimedia ecosystem. To be sure, Wikidata is here for a lot more than that as well. But if a Wiki project has a structural need for structured data, we are not at liberty to ignore that. It's part of what the WMF pays to keep the lights on for.
Commons-Wikidata links - 2015.svg
  • Secondly, the existing clause needs to come out. At the moment it serves only to mislead and confuse. To start with, it claims justification by a citation to an RfC (reference 7 this version) that has since been rejected in every particular. Any reference to that RfC from a current policy page is therefore downright confusing. But more fundamentally, the purpose of policy pages is to document current practice. The limitation that this clause sets out (no item that only sitelinks to a Commons category) is not current practice, and hasn't been for years.
The image on the right shows what has been accepted for at least two years. In the simple case when only a Commons category exists, it can sitelink directly to a main article here (as indeed 1.35 million currently do) (not shown). But in the case when eg a Commons gallery exists, then the agreed convention is that the Commons gallery should have the sitelink to the main item (since there can only be one), and in such cases the Commons category should be linked to an auxiliary category-type item here, with that and the main item tied together by reciprocal category's main topic (P301) and topic's main category (P910) statements. This has been aired many times over the last few months (eg most recently at Project Chat just a couple of weeks ago); it documents overwhelming current practice; and there is clear consensus for it -- a consensus already embodied in the existence of thousands of such items, even before the new items created this week.
  • A third thing to recognise is that needs evolve. Above, I said that a fundamental (though never exclusive) part of Wikidata is for is to support the needs of other wikis. In the beginning, those needs were simple, restricted to simple interwiki-ing. WD:N takes account of that need, by saying that if there are two pages that could/should be joined by an interwiki link, then we must permit an item here to support it.
But needs evolve. Wikis become able to use Wikidata content in more sophisticated ways, which is also a legitimate need. In particular, Commons now has a range of standard templates, such as c:Template:Wikidata infobox, c:Template:Interwiki from wikidata, and c:Template:Taxonavigation, with usage counts going into the millions, that draw on Wikidata, and expect either a direct link or link using the above mechanism via an auxiliary category-type item. As an example like c:Category:Grade I listed churches in Bedfordshire shows (infobox template on the right), even a simple category combines topics (P971) statement here is already enough to support a description of the categories that -- for the first time -- is immediately understandable in all the languages Wikidata can translate its labels into. This is exactly what Wikidata was invented for. It is a huge step forward for Commons; as well of course as being a very powerful engine for improving Wikidata, as people seek to extend or correct data, and perhaps above all to fill in missing translations into their own languages. It makes no sense to be unambitious here.
  • Fourth, we need to have an eye to the future, and make sure we are doing what we can to help the work people are trying to do. Getting as many sitelinks as possible for Commons categories is now a critical priority for the Structured Data initiative, because it is through file categorisation, and specifically from wikidata items linked to those categories, that realistically the first structured data will be drawn. Currently there are about 1.9 million Commons categories matched to Wikidata items; but about 4.8 million that so far aren't.
A crucial step towards that is putting in all the sitelinks for categories we can match to Wikidata items, (i) so that those relationships are accessible for querying at scale, and (ii) to clear them out of the way for database queries trying to identify what's left. It also means we can put in infoboxes for those categories at the same time, very worth having in their own right, as already noted; but also getting them out of the way of a drive to get Commons editors to add and improve infoboxes (effectively a grass-roots drive to add data to Wikidata items to make them more complete, and get more of them matched to Commons categories, so those links and that data are then there and ready to use for the Structured Data project to call on).
  • I am happy to discuss refining what the exact terms of the new text should be.
But the old text needs to come out, straight away, which is why I am going to pull it again now, because it is a lie, that is actively misleading people. Citation 7 is wholly inappropriate, being as the RfC it links to has since been rejected in every aspect; and it is highly misleading and simply not true to pretend that items for Commons categories with just that sitelink are not permitted. We've got thousands of them, have had for years, and it is fatuous to pretend we are going to cripple the Commons templates that are now making such good use of them, still less to encourage people to do that. The old clause is dead, and it's starting to smell. The time to cut it out is now.
I'd also note that there's no practical problem anyone has yet claimed these items are causing, nor anyone identified any practical alternative to what is being done. Jheald (talk) 22:30, 1 July 2018 (UTC)
  • Pardon my ignorance, but is not one of the purposes of Structured Data for Commons to make disappear the intersection categories such as c:Category:Grade I listed churches in Bedfordshire? Christian Ferrer (talk) 04:41, 3 July 2018 (UTC)
    @Christian Ferrer: Not as things currently stand, see commons:Commons:Structured_data/About/FAQ#Will_Commons_still_have_categories?_Will_Commons_categories_disappear?. Also, most of this discussion is not about intersection categories. Thanks. Mike Peel (talk) 11:00, 3 July 2018 (UTC)
    @Christian Ferrer: There are certainly people out there who think that Structured Data on Commons will make all categories there irrelevant (despite the somewhat mealy-mouthed assurance Mike has cited) -- indeed perhaps even some who would see that as a primary objective for structured data -- because in their view the new search system will give people everything they want, adding to their search facet by facet, prioritising the best, most valuable, most interesting images at each view, and allowing the whole collection to be explored in a much freer, more flexible, more sponaneous, not pre-planned way.
It's a very enticing vision.
But, myself, I am dubious that the new search will ever kill off categories and the category hierarchy they sit in, no matter how good and slick the search interface becomes, and how many very nice new features and alternate options to view and order or segment and filter the results that it offers. :::Nor do I think the community would allow them to ever be completely replaced, because for all the wonders of search, there is something very transient and impermanent about the results. I think there will continue to be a demand from people to be able to create and curate views of parts of the collection with more of a sense of concreteness and solidity, which they can work on as a finite focus project to improve, annotate with relevant additional information and links, and which are there as a specific landing page for people coming to Commons from a particular wiki article. There's a lot of curation and personal engagement from Commons editors that has gone into categories, and I don't think that will be given up lightly, nor the hierarchical structure that makes these views navigable -- and which provides useful pre-curated, pre-computed ways to drill down through the collection.
Eventually I expect the two systems will find ways to continue to persist side by side -- and enhance each other. If Structured Data on Commons does take off (by no means a given at this stage), it should make it possible to add a lot of machine-support to categorisation, making it much more robust and complete; and I think it's also likely some of the additional view options (eg show in order of quality; in order of date created; in order of date depicted; etc) would also be ported to categories, making the category-viewing experience richer.
In the other direction, one shouldn't underestimate the sheer amount of work it's going to take to get the Structured Data search to anywhere near the level of the current category system. Even the search mechanics are decidedly non-trivial. Consider a search such as eg: "Portrait of a man in a hat". First the user needs to translate that (presumably via the faceted menu system) into depicts (P180) human (Q5), qualifier sex or gender (P21) male (Q6581097), plus depicts (P180) hat (Q80151), but then to satisfy that the system needs to retrieve results from search over all men and all hats, because a picture like Portrait of Doge Leonardo Loredan (Q1759759) will not be tagged depicts (P180) human (Q5) + depicts (P180) hat (Q80151), but rather depicts (P180) Leonardo Loredan (Q250210) shown with features (P1354) doge's hat (Q1134210).
And of course the data returned from the Structured Data search will only be as good as the tagging -- for example, Bellini's doge doesn't currently have a shown with features (P1354) doge's hat (Q1134210) qualifier, so wouldn't be returned. 40 million images to comprehensively describe with tags is going to be a mammoth undertaking, and mammoth to maintain. In my view the initial wave of tags are going to have to come from the categorisation, because there is nowhere else to realistically get the information from, a form that is mapped (or even readily mappable) to the vocabulary of Wikidata. That's why, to make a success of structured data, in my view we need to be getting category combines topics (P971) descriptions in for categories like "Interior of Church <X> in Place <Y>", because often that intersection category is the only categorisation information on the image. If we can systematically record what categories like that represent, then we will be able to immediately tag for Structured Data that the image is of Church <X>, specifically the interior. But without having translated the category, we're not going to be able to do that. So that's why I think it's so critical to map the meaning of categories, including intersection categories, to the Wikidata vocabulary now, so that that is already done by the time the depicts (P180) property for images starts becoming available this October/November. In my view, it's going to be absolutely essential for the first wave of tagging. But your question was directed at the longer term, and longer term too I think categories are going to continue to be relevant, because I think they give a structure for curation that we are going to continue to need. We're going to need to go on adding tags to images to improve search, even after the first wave. And I suspect that category refinement, using existing tools like Cat-a-lot, is going to continue to be one of the most efficient and inviting ways to do that. We may well develop similar tools to refine a search result by adding tags, but I am not sure they will have the same appeal, of having completed a work of curatorship that one knows is then in the structure, and that one can point to or link to as the work one has done. My expectation is that I don't think Commons will let that go lightly. Jheald (talk) 11:32, 3 July 2018 (UTC)
In fact I commented here because someone put a message on my talk page after I nominated items (created by me) for deletion. But the truth is that I'm not competent at all, and that I'm not here in one of my main subject of interest. Therefore no need to ping me or to come talking about that in my talk page, thanks you, and sorry again if I disturbed someone. Regards, Christian Ferrer (talk) 11:46, 3 July 2018 (UTC)

I see the need to adapt our policies, however I have some concerns:

  1. There are around 4.1M category items right now in Wikidata, so this proposal roughly aims to double that number. However, even rather simple queries involving category items are at the edge of timing out already right now. We should make sure that Wikidata's infrastructure is capable of dealing with the extra items as well.
  2. The notability criteria at Commons sometimes seem even less restrictive than ours, and my feeling is that a policy change as proposed would offer an easy path to put purely promotional content to Wikidata via a Commons category that nobody contests. How does the Commons community try to avoid such behavior?
  3. Many Commons categories are technical intersection categories combining two or more distinctive properties of the members. Their purpuse is to provide ways of access that the category system otherwise lacks (dynamical intersection), but they are there in substantial numbers. I can't spontaneously image any scenario where intersection categories need a Wikidata item, as long as there are no other interwiki links than to Commons. Can we somehow exclude intersection categories from being connected to Wikidata?

--MisterSynergy (talk) 20:20, 4 July 2018 (UTC)

@MisterSynergy: On your last point, it is useful to have Wikidata items for intersection categories, and (as per this proposed text) I would specifically not exclude them, for at least two reasons: (a) because the Wikidata infobox template is actually very useful on such categories, translating the meaning of them for non-English speakers (see for instance the example of c:Category:Grade I listed churches in Bedfordshire noted above); and (b) because such categories may nevertheless contain images -- indeed many terminal categories (ie categories with no subcategories) are in fact intersection categories, or can be thought of as such, eg "interior of church X" -- and so we're going to need the meaning of the category translated into Wikidata items, in order to be able to suggest Structured Data topics for images in the category. Jheald (talk) 21:00, 4 July 2018 (UTC)
I'm not really up to date about Commons structured data progress (unfortunately), but I always thought that the initiative will make the intersection category monster superfluous once more (better) structure is available. Am I wrong here, and they are going to be retained?! That would be a huge mistake to my opinion. --MisterSynergy (talk) 21:18, 4 July 2018 (UTC)
@MisterSynergy: See my answer to Christian Ferrer, a few paragraphs above. But a few points: (i) Many of intersection categories are actually right at the bottom of the hierarchy, so they would be what one would directly to read topic information from; (ii) per Mike Peel's study of 1000 yet-to-be-matched categories mentioned above, perhaps less than half of the remaining categories are intersection categories; (iii) faceted search will give people a powerful different way to drill down to Commons content. But as I wrote to Christian Ferrer, I think categories will continue to have their uses in parallel with it; and those categories will continue to need to be linked hierarchically, to fit them into an organised comprehendable structure. That naturally leads to the intersection categories. It's possible that there won't be so many people creating new ones; but I don't think the old ones are likely to go away; and in any case I suspect (iv) there will continue to be some value in having pre-curated ways to facet the collection, in parallel to what may be possible with the new search. Jheald (talk) 21:56, 4 July 2018 (UTC)
Thanks, although I really don't like this perspective (to be honest: I can observe something similar at dewiki. A considerable amount of editors have invested their entire Wikipedia career into the categorization tree, and they are now unable to let go in favor of a superior technology).
Actually, my observation is that intersection categories are almost always somehow incomplete and badly maintained, very often very inconsistent by not really inheriting the properties of the supercategories, and furthermore much less stable in the sense that they may be subject to large-scale deletion procedures much more than regular non-intersection categories (I am the one who takes most of the workload to remove the residual empty items from Wikidata these days). Since Commons uses them much more than any other project, I see a considerable amount of extra work arising particularly from intersection categories for our community as well. --MisterSynergy (talk) 22:07, 4 July 2018 (UTC)
@MisterSynergy: The completeness is something I think may get a big help from better integration with Wikidata, making it easier for bots or machine-assisted editors to discover what ought to be in the category, and compare it to what actually is these. As to whether there's a lot of deletion, I'm a bit dubious. It's not something I had particularly noticed at Commons. Jheald (talk) 22:35, 4 July 2018 (UTC)
I'm also a bit surprised with the focus on categories, because as MisterSynergy said they could be replaced with superior technology. As I see it, a category could be replaced by a (cached) query. We still don't have the means to do that, but that doesn't mean that it is not possible. Instead of including manually the category or intersecting category on each file, it is better to add the relevant information that will make the file show up in all relevant queries. It should be possible to have a hierarchy of queries too, in fact I find it better to have subqueries than to have subcategories. I don't understand why this option is not being considered to offer an alternative to categories. Is it because people are too invested in categorization? Or because they cannot envision something different? I would understand the second because it doesn't exist yet, but I feel it is important to explain that other options are possible and that categories (in the classic sense) are not the only way to do things.--Micru (talk) 11:52, 5 July 2018 (UTC)
  • I would support this for one simple reason besides all the others: We could delete P373. We should manage those connections via our data item connections, not by hard-coding the name of a category into an item. --Izno (talk) 18:02, 8 July 2018 (UTC)

Notability criteria for Commons[edit]

Hi, I copy the discussion from c:Commons talk:Structured data/Get involved/Feedback requests/Properties for Commons#Notability criteria for separate items, as this needs more input. Should these objects have a separate item in Wikidata? Thanks for your comments. Regards, Yann (talk) 18:19, 5 July 2018 (UTC)

  • I would say yes. We can find references for that in currency catalogues.
  • I would say yes. We can find references for that in currency catalogues.
For coins and banknotes, I would expect each distinct value to be notable (and in many cases we'll already have items, e.g. 1 paisa (Q28179451)), I'm not sure whether to split the variants (e.g. we could have two items, one for the nickel-brass version, one for the aluminium version, or we could have one item which has start/end dates on the material used (P186) statements), but I wouldn't expect each year/location of minting to have separate items. I wouldn't expect a single individual coin/banknote to have an item unless it's notable in some way (e.g. has an identifier as part of a museum collection). - Nikki (talk) 20:31, 7 July 2018 (UTC)
Nikki: Thanks for your message. There are already 3 different coins for 1 paisa (Q28179451), and there may be more. For 1 rupee, there would be at least a dozen different coins over the years. So how do we split the information? Otherwise, I agree with you about notability. Regards, Yann (talk) 23:33, 7 July 2018 (UTC)
I would say all of these variants should be notable.--Ymblanter (talk) 19:20, 9 July 2018 (UTC)

Monitoring the reciprocal use[edit]

The "notability" notion is subjective and error-prone... A semi-automated filter is the "curated by reciprocal use" and its monitoring, see https://github.com/OSMBrasil/semantic-bridge

WdOsm-semanticBridge.jpg

--Krauss (talk) 13:38, 7 July 2018 (UTC)