Wikidata talk:WikiProject Categories

From Wikidata
Jump to navigation Jump to search
WikiProject Categories
WikiProject to solve any issues regarding categories.


Stubs, templates and category combines topics[edit]

Hi everyone! So, in this first day of the WikiProject I want to start discussing an issue that creates a lot of constraint violations.

So, at the moment there are 16465 cases of instance of (P31)  Wikimedia category of stubs (Q24046192) + category combines topics (P971)  Wikipedia:Stub (Q4663261). However, 12511 of them are constraint violations because category combines topics (P971) has only that value, even if it should have multiple values; the remaining items have category combines topics (P971)  Wikipedia:Stub (Q4663261) + other category combines topics (P971) indicating stubs' topics.

  1. My proposal is substituting category combines topics (P971)  Wikipedia:Stub (Q4663261) with category contains (P4224)  Wikipedia:Stub (Q4663261) and substituting the remaining category combines topics (P971) with main subject (P921) used as qualifier of category contains (P4224)  Wikipedia:Stub (Q4663261).

Same problem regarding 216 cases of instance of (P31)  Wikimedia templates category (Q23894233) + category combines topics (P971)  Wikimedia template (Q11266439) (56 of which are constraint violations of category combines topics (P971)) and 34442 cases of instance of (P31)  Wikimedia category (Q4167836) + category combines topics (P971)  Wikimedia template (Q11266439) (34162 of which are constraint violations of category combines topics (P971)).

  1. My proposal is substituting instance of (P31)  Wikimedia category (Q4167836) with instance of (P31)  Wikimedia templates category (Q23894233), substituting category combines topics (P971)  Wikimedia template (Q11266439) with category contains (P4224)  Wikimedia template (Q11266439) and substituting the remaining category combines topics (P971) with main subject (P921) used as qualifier of category contains (P4224)  Wikimedia template (Q11266439).

Thank you for your interest in categories! Bye, --Epìdosis 13:04, 8 July 2018 (UTC)

Jura
PKM
ValterVB
Jheald
Ghuron
Infovarius
Sannita
Avatar6
Pasleim
John Samuel


Pictogram voting comment.svg Notified participants of WikiProject Categories Please give your opinion! --Epìdosis 12:29, 11 July 2018 (UTC)

Hi all! Some thouhgts on topic regarding template categories: 1. item Wikimedia templates category (Q23894233) and subclasses were created and used much earlier than P4224. 2. ...so some wikis used to use property P31 in their category related templates logic; wd-item deletion will broke it's logic. 3. Name of class, such as Wikimedia navboxes category is much easier to understand, than constructed statement, sentence from two or more Wikidata Properties, in some languages at least. 4.Some wikis used to pseudonamespaces in category namespace, e.g. Категория:Навигационные шаблоны:<topic>, Категория:Шаблоны:<topic> in ruwiki & it's satellite east slavic wikis. --Avatar6 (talk) 12:52, 30 July 2018 (UTC)

Metacategory vs. Category[edit]

We previously had a short chat about that with @Jura1:, but I believe now we can involve a few other interested participants here. I would be very convenient to "mark" somehow metacategories (categories, that are intended to include other categories only). Although I understand that we don't want to "mirror the structure of Wikipedia categories with P31/P279 on Wikidata" I see nothing particularly wrong with marking such items as P31:Q30432511 (instead of P31:Q4167836) --Ghuron (talk) 12:26, 11 July 2018 (UTC)

I support instance of (P31)  metacategory in Wikimedia projects (Q30432511) as I support instance of (P31)  Wikimedia templates category (Q23894233) because these categories contain not articles, but respectively other categories and templates. --Epìdosis 12:33, 11 July 2018 (UTC)
  • How would we know what Wikipedia/etc. users put in there? Would we have to monitor them? Redoing P31 periodically based on people's usage? Why complicate things for users how want to check a structure by splitting the topic among several properties? The point of a flat P971 is that it's flat ;)
    --- Jura 12:39, 11 July 2018 (UTC)
    • @Jura1: there is nothing wrong when bot will assign for all new categories (including metacategories) instance of (P31)  Wikimedia category (Q4167836). But if someone will notice that particular category is indeed metacategory, she can narrow down P31 value. I have nothing against P971, and I'm using it extensively. The problem is that there is no universal way to express "this is metacategory" statement via P971. Instead I can say category combines topics (P971)  by country (Q19360703) or category combines topics (P971)  by city (Q18683478) or category combines topics (P971)  by genre (Q42903116). And if I want to get "all non-meta-categories", I will have to FILTER NOT EXISTS {?cat wdt:P971/wdt:P31 wd:Q24571886}? It's gonna be slow :( --Ghuron (talk) 13:05, 11 July 2018 (UTC)
      • The problem is that people may not want any category items, so just exclude any P31 with a given value is way easier.
        It's still not clear how you decide what categories that only contain categories are.
        --- Jura 13:12, 11 July 2018 (UTC)
        • Well, judging mostly from the title (e.g. I assumes that Category:Argentine women by occupation (Q8262772) is intended to contains other categories). More formally, there is Q4048796, that is used at least in en-wiki quite extensively
          PREFIX mw: <http://tools.wmflabs.org/mw2sparql/ontology#>
          SELECT ?item ?page WHERE {
            hint:Query hint:optimizer "None" .
            SERVICE <http://tools.wmflabs.org/mw2sparql/sparql> {
              ?page mw:inCategory <https://en.wikipedia.org/wiki/Category:Container_categories> .
            }
            ?page schema:about ?item . ?item wdt:P31 wd:Q4167836
          }
          
          Try it! --Ghuron (talk) 13:29, 11 July 2018 (UTC)
What we need perhaps is to focus more narrowly on categories of the type X by Y, the topic of recent discussion at Wikidata:Project_chat#Category:A_[split_up_by]_B.
For the record, I also dislike the idea of introducing lots of subclasses of Wikimedia category (Q4167836). I would prefer to indicate contents or attributes of the category by using additional statements. Jheald (talk) 17:21, 11 July 2018 (UTC)
One idea might be to identify such classes using something like
<category item> category combines topics (P971) "X"
<category item> category combines topics (P971) "Y" object has role (P3831) "Partition class"
Category items with statements of this pattern would be fairly easy to require or to exclude in a query. Jheald (talk) 17:37, 11 July 2018 (UTC)
As an exploration of what we have to deal with, here's a quick query looking at 30,000 categories with en-labels of this form, to see the sort of partitioning classes that may be most relevant. tinyurl.com/y85gqesz. "By country" (4498), "by year" (807), "by nationality" (771), "by state" (755) lead the list. Jheald (talk) 18:01, 11 July 2018 (UTC)
For what it's worth, here are the current uses of category combines topics (P971) on such categories:
Jheald (talk) 18:28, 11 July 2018 (UTC)
The "by ..." items are all members of meta category criterion (Q24571886) View with Reasonator View with SQID, created by User:Shinnin. According to Reasonator, there are 38 items currently in this class. It's an interesting approach, but I don't think it scales well -- I think the qualifier I have suggested above would be a more general approach. On the other hand, perhaps it doesn't need to scale very far -- the number of partitioning classes we need to cope with is fairly finite, at least judging by the query above.
Pinging @Shinnin: Are there any particular advantages of your model that you would like to bring into the discussion? Jheald (talk) 18:40, 11 July 2018 (UTC)
@Jheald: I didn't create meta category criterion (Q24571886) View with Reasonator View with SQID. "By <something>" type of items existed in Wikidata before I started editing here. I think by year (Q29053180) is my creation, the rest are not. I've used these types of items mainly because they seemed to be the de facto way of modeling these types of categories.
I do think that after category contains (P4224) was created, many of the current use cases could have been changed to use it instead of P971. E.g. Category:Albums by year (Q6695739)category contains (P4224)  album (Q482994) / grouped byyear (Q577) However, this approach would only work for categories that group articles based on their type. Not the ones that group them base on a common topic (e.g. Category:Geography by country (Q6491485)). All in all, I'm not an advocate of the current system. --Shinnin (talk) 20:10, 11 July 2018 (UTC)
I can see your point now and I believe I need to clarify what I'm trying to do. I want to exclude categories similar to X by Y from the scope of my queries. Although I believe that setting instance of (P31)  metacategory in Wikimedia projects (Q30432511) for them is most efficient way to achieve that, I'm not against any other way to labeling them that would fulfill my needs. We've been discussing idea of using P971 on the thread above and I still do not see efficient way how I can exclude (compare this to this) --Ghuron (talk) 18:45, 11 July 2018 (UTC)
@Ghuron: I'm not sure that I would read too much into that comparison, unless that count is literally all you want to do, in which case you can calculate a count excluding a particular subset efficiently simply by subtraction.
The first query is fast because the query engine never has to materialise the items, it can just count the difference between two index positions.
As soon as you are wanting anything more concrete (typically involving a more restricted solution set), the difference in time between the two queries would be a lot smaller. Jheald (talk) 20:49, 11 July 2018 (UTC)
Also worth noting that this query is not particularly happy either. Jheald (talk) 20:56, 11 July 2018 (UTC)
Fair point about count, let's assume I want to get labels for all categories w/o P4224 (for machine learning experiment). No single query can return 4M records in 60 seconds, so I'm using LIMIT/OFFSET. Let's see how well each of discussed schemas fits here:
One might argue that my task is rare, but even if we accept this, I still failed to see any reasons against using instance of (P31)  metacategory in Wikimedia projects (Q30432511) except pure aesthetical (that are very subjective) --Ghuron (talk) 11:45, 12 July 2018 (UTC)
@Ghuron: It's not a great surprise that putting a LIMIT 50000 on a COUNT query with a one-line answer fails to be particularly effective :-)
Starting with this (or its OPTIONAL { ... } FILTER (!bound(...)) alternative) might be more interesting. Jheald (talk) 13:03, 12 July 2018 (UTC)
@Jheald: yep, too may query windows open, but still getting timeout on OFFSET 1000000 LIMIT 1. Couldn't figure out how to use OPTIONAL { ... } FILTER (!bound(...)) here because (unlike P4224) we expect several values on P971, and having P3831 qualifier on ANY of them should eliminate category from output --Ghuron (talk) 13:09, 12 July 2018 (UTC)
@Ghuron: I would do it this way: generate a controlled number of categories first, then start applying tests to them. Jheald (talk) 13:26, 12 July 2018 (UTC)
On a tranche of 500,000 categories, the hash join to exclude p:P971/pq:P3831 is adding about 8 seconds (17 seconds vs 9 seconds). Jheald (talk) 13:31, 12 July 2018 (UTC)
This variant is a bit slower, at 29 seconds. Jheald (talk) 13:36, 12 July 2018 (UTC)
@Jheald: as long as it fits 60s timeout, a few seconds slower doesn't matter. Your approach works for both P3831 qualifier and by country (Q19360703)/by city (Q18683478)/by genre (Q42903116)/etc (see [2]). But my understanding was that if something can be expressed without qualifiers, it should be expressed so. Shouldn't we use meta category criterion (Q24571886) children? --Ghuron (talk) 13:47, 12 July 2018 (UTC)
@Ghuron: I would delete that entire class, because I think it just causes difficulty and confusion -- starting with the name, which is deeply opaque. IMO, if something is being used as the partitioning class, it is better to use the regular class-item for that, with a qualifier to say that that is its role, rather to expect people to create and consistently use parallel different items. Jheald (talk) 14:13, 12 July 2018 (UTC)
@Jheald, Jura1, Epìdosis: So apparently we have 3 or 4 competing proposals here (not counting what was discussed in Wikidata:Project_chat#Category:A_[split_up_by]_B). Since any of them will work for me, I don't really care which one will be selected, but I do want us to select one (so I can start using it). Please advise how should we proceed from here --Ghuron (talk) 17:12, 12 July 2018 (UTC)

metacategory can not be strict class on wikis. Its only intention to be so. If metacategory contains pages it is not an error, it is just warning, because it can be happened on much reasons, e.g. wiki does not have sutable subcategory for that metacategory. Either dnot have yet or have not much usability of such subcategory name, or for now only.--Avatar6 (talk) 13:13, 30 July 2018 (UTC)

Triple and double information[edit]

Hi all! I want to restart the previous discussion from another point of view: redundant information. These are the cases:

In your opinion how should we deal with these? Thank you, --Epìdosis 19:52, 29 July 2018 (UTC)

  • I think P4424 was created knowing that it would duplicate P971. It's just meant to provide some different functionality. It may or may not be useful for stub categories. Maybe one shouldn't fill P4424 unless actually needed.
    --- Jura 05:39, 30 July 2018 (UTC)
  • I do not see huge issues with duplicated information between P971 and P4224. Of cause, once data model for categories would be agreed and accepted, it will be more elegant comparing to examples above. I prefer to work with P4224, Jura - with P971, so examples above merely represent "work in progress" --Ghuron (talk) 06:40, 30 July 2018 (UTC)

Template categories - find a solution[edit]

Hi all! I write again to try to solve the problem of template categories: my proposal is

  instance of (P31) category combines topics (P971) + category contains (P4224)
General purposes Wikimedia templates category (Q23894233) Wikimedia template (Q11266439)
Navboxes Wikimedia templates category (Q23894233) Wikimedia navigational template (Q11753321)
Infoboxes Wikimedia templates category (Q23894233) Wikimedia infobox template (Q19887878)
Userboxes Wikimedia templates category (Q23894233) Wikimedia userbox template (Q20769160)
Babel Wikimedia templates category (Q23894233) Wikimedia user language template (Q19842659)

And, in my opinion, Wikimedia navboxes category (Q13331174) and Wikimedia infobox templates category (Q23894246) should be deleted. Here you can see how many times they are used. What's your opinion? --Epìdosis 08:30, 25 August 2018 (UTC)

You are right that Wikimedia navboxes category (Q13331174) & Wikimedia infobox templates category (Q23894246) seems to be redundant. But they do preserve category classes in parallel with category contains (P4224) & category combines topics (P971) and were created and is used for now in template uk:Шаблон:Категорія шаблонів to classify template categories by "pseudo category-namespaces". And, besides, they can be used to simply descriptions that differs and more exact from overused description "Wikimedia category".--Avatar6 (talk) 06:41, 28 August 2018 (UTC)

Wikidata property usage tracking categories[edit]

Hi all! How would you describe a Wikidata property usage tracking category (Q24514938)? At the moment two different systems are used:

  1. instance of (P31)  Wikimedia administration category (Q15647814) + category combines topics (P971)  Wikidata property usage tracking category (Q24514938) (around 2700 cases)
  2. instance of (P31)  Wikidata property usage tracking category (Q24514938) (around 50 cases)

I'm not sure about which is the best: what's your opinion? --Epìdosis 08:40, 25 August 2018 (UTC)

As for me 2nd way is simpler and more exact. Sole P31 is always better i think. P971 should contain pid and case of tracking (same as WD, differs from WD, noWD...).--Avatar6 (talk) 06:48, 28 August 2018 (UTC)
In the second case category combines topics (P971)  value from Wikidata (Q40218570) (and the other 3 similar) should migrate to category contains (P4224)  value from Wikidata (Q40218570) and category combines topics (P971)  Wikidata property usage tracking category (Q24514938) should obviously be removed, is it correct @Avatar6, Jura1:? --Epìdosis 09:49, 31 August 2018 (UTC)

I support this change, as it makes things easier:

MisterSynergy (talk) 19:51, 31 August 2018 (UTC)

Thank you. If there is no opposition, I will do it on the 7th of September. --Epìdosis 09:46, 1 September 2018 (UTC)
  • I don't really care what you do with P4224, but it seems that the proposal doesn't meet the English description of the property: "category contains elements that are instances of this item". As for P971, please don't remove the value from there.
    --- Jura 10:14, 1 September 2018 (UTC)

Metacategorization of categories[edit]

For serial categories by date it realy helps. Pls see the idea at Category:Populated places established in 2018 (Q48328925), Category:Populated places established in 2017 (Q28605372), Category:Populated places established in 2016 (Q24089347), Category:Populated places established in the 2010s (Q7476468).--Avatar6 (talk) 06:16, 28 August 2018 (UTC)

@Avatar6: The use of subclass of (P279) for category items is deprecated, so it should be removed; regarding the other statements in the items, I support them. --Epìdosis 10:23, 28 August 2018 (UTC)
Removing is as simple and quick as adding do not. Any replacement? The use of catalog (P972) for same porpuses is also prohibited as I see. So how we can seek e. g. decade category for year category ? or (meta)categories fo geo-regular ones?--Avatar6 (talk) 10:54, 28 August 2018 (UTC)
  • I think the P971 statements aren't optimal. Something like "settlement", "by inception", "2018" would be preferable.
    --- Jura 10:30, 28 August 2018 (UTC)
    Why "settlement" if named "populated places" which is in P4224? "by inception" - is a METAcategory criterium, but this ones are not a metacategories.--Avatar6 (talk)
    • yes, "inception" is actually better, but "populated places" can be in there. P4224 is unrelated. I don't see why "calendar" would be there though.
      --- Jura 07:39, 30 August 2018 (UTC)

Transitivity of P4224[edit]

We have discussion with Avatar6 (talkcontribslogs) regarding possibility of using P4224:Q5 for categories like Category:Spanish people by occupation (Q7088137). My understanding is that P4224 is intended to specify types of element, that are normally included into that category. Categories like Category:Spanish people by occupation (Q7088137) are not intended to contain anything but subcategories by definition. My opponent believes that we can specify here P4224:Q5 because "P4224 means that category or its subcategories contains statement:P4224 or its subclases". What do you think? --Ghuron (talk) 04:43, 30 September 2018 (UTC)

@Ghuron: He's right, you're wrong.
But we should maybe come up with some way to indicate that a category is normally completely diffused. Jheald (talk) 08:23, 30 September 2018 (UTC)
As Jheald. --Epìdosis 10:42, 30 September 2018 (UTC)
@Jheald, Epìdosis: I would certainly accept consensus here, in fact his approach would technically be easier for me, but I'd like the get some clarifications. Transitivity of categories is not #1 priority in local wikis, I'm frequently encountering cases like en:Category:Soprano Arias is included in en:Category:Sopranos. Having said that and considering that any wikidata item for category can have its own P4224 statement, why exactly we need transitivity of P4224? --Ghuron (talk) 06:40, 1 October 2018 (UTC)
@Ghuron: Sorry that I was blunt above, you do deserve a bit better & explanation. For me there are a few issues here. First, if we think how these are going to appear e.g. in a Commons infobox, giving a useful breakdown of the title of the category into distinct atomic elements that may be translated into the readers own language, it's useful for that breakdown to give a complete statement of what the category stands for, and it's useful to go to the level of distinct atomic concepts. Second, if we think of a new file being trickled down the category hierarchy until it finds the category where it belongs, it's useful to indicate at each level the specification for the category and its subcategories, to test the file against. Thirdly, if we're researching how inheritance works (and doesn't) going down the category hierarchy, that's easier with a full spec of what the category represents.
Category meanings aren't transitive -- or at least, not as straightforwardly as subclass of (P279). You tend to find you get a series of sequential refinements, but then you get to a 'knuckle' where the structure radiates off into all sorts of different directions. This is because a category tries to capture things related to X, rather than just subdivisions of X. But the test of whether a file (or article) being categorised should continue to trickle down through and beyond this category is a good one, I think, when considering what ought to go in to a category contains (P4224) specification. Jheald (talk) 09:01, 1 October 2018 (UTC)
When I'm looking on infobox of commons:Category:People of Spain by occupation, P971 provides much more "useful breakdown of the title of the category" than just P4224:Q5. I'm not sure I fully understand how this file categorization thing really works, but there is nothing that makes my life more difficult, so I'm ok with either decision here. I guess if we will not hear any objections, we can consider this as a consensus --Ghuron (talk) 08:58, 2 October 2018 (UTC)