Wikidata talk:WikiProject Categories

From Wikidata
Jump to navigation Jump to search
WikiProject Categories
WikiProject to solve any issues regarding categories.


Stubs, templates and category combines topics[edit]

Hi everyone! So, in this first day of the WikiProject I want to start discussing an issue that creates a lot of constraint violations.

So, at the moment there are 16465 cases of instance of (P31)  Wikimedia category of stubs (Q24046192) + category combines topics (P971)  Wikipedia:Stub (Q4663261). However, 12511 of them are constraint violations because category combines topics (P971) has only that value, even if it should have multiple values; the remaining items have category combines topics (P971)  Wikipedia:Stub (Q4663261) + other category combines topics (P971) indicating stubs' topics.

  1. My proposal is substituting category combines topics (P971)  Wikipedia:Stub (Q4663261) with category contains (P4224)  Wikipedia:Stub (Q4663261) and substituting the remaining category combines topics (P971) with main subject (P921) used as qualifier of category contains (P4224)  Wikipedia:Stub (Q4663261).

Same problem regarding 216 cases of instance of (P31)  Wikimedia templates category (Q23894233) + category combines topics (P971)  Wikimedia template (Q11266439) (56 of which are constraint violations of category combines topics (P971)) and 34442 cases of instance of (P31)  Wikimedia category (Q4167836) + category combines topics (P971)  Wikimedia template (Q11266439) (34162 of which are constraint violations of category combines topics (P971)).

  1. My proposal is substituting instance of (P31)  Wikimedia category (Q4167836) with instance of (P31)  Wikimedia templates category (Q23894233), substituting category combines topics (P971)  Wikimedia template (Q11266439) with category contains (P4224)  Wikimedia template (Q11266439) and substituting the remaining category combines topics (P971) with main subject (P921) used as qualifier of category contains (P4224)  Wikimedia template (Q11266439).

Thank you for your interest in categories! Bye, --Epìdosis 13:04, 8 July 2018 (UTC)

Jura
PKM
ValterVB
Jheald
Ghuron
Infovarius
Sannita
Avatar6
Pasleim


Pictogram voting comment.svg Notified participants of WikiProject Categories Please give your opinion! --Epìdosis 12:29, 11 July 2018 (UTC)

Hi all! Some thouhgts on topic regarding template categories: 1. item Wikimedia templates category (Q23894233) and subclasses were created and used much earlier than P4224. 2. ...so some wikis used to use property P31 in their category related templates logic; wd-item deletion will broke it's logic. 3. Name of class, such as Wikimedia navboxes category is much easier to understand, than constructed statement, sentence from two or more Wikidata Properties, in some languages at least. 4.Some wikis used to pseudonamespaces in category namespace, e.g. Категория:Навигационные шаблоны:<topic>, Категория:Шаблоны:<topic> in ruwiki & it's satellite east slavic wikis. --Avatar6 (talk) 12:52, 30 July 2018 (UTC)

Metacategory vs. Category[edit]

We previously had a short chat about that with @Jura1:, but I believe now we can involve a few other interested participants here. I would be very convenient to "mark" somehow metacategories (categories, that are intended to include other categories only). Although I understand that we don't want to "mirror the structure of Wikipedia categories with P31/P279 on Wikidata" I see nothing particularly wrong with marking such items as P31:Q30432511 (instead of P31:Q4167836) --Ghuron (talk) 12:26, 11 July 2018 (UTC)

I support instance of (P31)  metacategory in Wikimedia projects (Q30432511) as I support instance of (P31)  Wikimedia templates category (Q23894233) because these categories contain not articles, but respectively other categories and templates. --Epìdosis 12:33, 11 July 2018 (UTC)
  • How would we know what Wikipedia/etc. users put in there? Would we have to monitor them? Redoing P31 periodically based on people's usage? Why complicate things for users how want to check a structure by splitting the topic among several properties? The point of a flat P971 is that it's flat ;)
    --- Jura 12:39, 11 July 2018 (UTC)
    • @Jura1: there is nothing wrong when bot will assign for all new categories (including metacategories) instance of (P31)  Wikimedia category (Q4167836). But if someone will notice that particular category is indeed metacategory, she can narrow down P31 value. I have nothing against P971, and I'm using it extensively. The problem is that there is no universal way to express "this is metacategory" statement via P971. Instead I can say category combines topics (P971)  by country (Q19360703) or category combines topics (P971)  by city (Q18683478) or category combines topics (P971)  by genre (Q42903116). And if I want to get "all non-meta-categories", I will have to FILTER NOT EXISTS {?cat wdt:P971/wdt:P31 wd:Q24571886}? It's gonna be slow :( --Ghuron (talk) 13:05, 11 July 2018 (UTC)
      • The problem is that people may not want any category items, so just exclude any P31 with a given value is way easier.
        It's still not clear how you decide what categories that only contain categories are.
        --- Jura 13:12, 11 July 2018 (UTC)
        • Well, judging mostly from the title (e.g. I assumes that Category:Argentine women by occupation (Q8262772) is intended to contains other categories). More formally, there is Q4048796, that is used at least in en-wiki quite extensively
          PREFIX mw: <http://tools.wmflabs.org/mw2sparql/ontology#>
          SELECT ?item ?page WHERE {
            hint:Query hint:optimizer "None" .
            SERVICE <http://tools.wmflabs.org/mw2sparql/sparql> {
              ?page mw:inCategory <https://en.wikipedia.org/wiki/Category:Container_categories> .
            }
            ?page schema:about ?item . ?item wdt:P31 wd:Q4167836
          }
          
          Try it! --Ghuron (talk) 13:29, 11 July 2018 (UTC)
What we need perhaps is to focus more narrowly on categories of the type X by Y, the topic of recent discussion at Wikidata:Project_chat#Category:A_[split_up_by]_B.
For the record, I also dislike the idea of introducing lots of subclasses of Wikimedia category (Q4167836). I would prefer to indicate contents or attributes of the category by using additional statements. Jheald (talk) 17:21, 11 July 2018 (UTC)
One idea might be to identify such classes using something like
<category item> category combines topics (P971) "X"
<category item> category combines topics (P971) "Y" object has role (P3831) "Partition class"
Category items with statements of this pattern would be fairly easy to require or to exclude in a query. Jheald (talk) 17:37, 11 July 2018 (UTC)
As an exploration of what we have to deal with, here's a quick query looking at 30,000 categories with en-labels of this form, to see the sort of partitioning classes that may be most relevant. tinyurl.com/y85gqesz. "By country" (4498), "by year" (807), "by nationality" (771), "by state" (755) lead the list. Jheald (talk) 18:01, 11 July 2018 (UTC)
For what it's worth, here are the current uses of category combines topics (P971) on such categories:
Jheald (talk) 18:28, 11 July 2018 (UTC)
The "by ..." items are all members of meta category criterion (Q24571886) View with Reasonator View with SQID, created by User:Shinnin. According to Reasonator, there are 38 items currently in this class. It's an interesting approach, but I don't think it scales well -- I think the qualifier I have suggested above would be a more general approach. On the other hand, perhaps it doesn't need to scale very far -- the number of partitioning classes we need to cope with is fairly finite, at least judging by the query above.
Pinging @Shinnin: Are there any particular advantages of your model that you would like to bring into the discussion? Jheald (talk) 18:40, 11 July 2018 (UTC)
@Jheald: I didn't create meta category criterion (Q24571886) View with Reasonator View with SQID. "By <something>" type of items existed in Wikidata before I started editing here. I think by year (Q29053180) is my creation, the rest are not. I've used these types of items mainly because they seemed to be the de facto way of modeling these types of categories.
I do think that after category contains (P4224) was created, many of the current use cases could have been changed to use it instead of P971. E.g. Category:Albums by year (Q6695739)category contains (P4224)  album (Q482994) / grouped byyear (Q577) However, this approach would only work for categories that group articles based on their type. Not the ones that group them base on a common topic (e.g. Category:Geography by country (Q6491485)). All in all, I'm not an advocate of the current system. --Shinnin (talk) 20:10, 11 July 2018 (UTC)
I can see your point now and I believe I need to clarify what I'm trying to do. I want to exclude categories similar to X by Y from the scope of my queries. Although I believe that setting instance of (P31)  metacategory in Wikimedia projects (Q30432511) for them is most efficient way to achieve that, I'm not against any other way to labeling them that would fulfill my needs. We've been discussing idea of using P971 on the thread above and I still do not see efficient way how I can exclude (compare this to this) --Ghuron (talk) 18:45, 11 July 2018 (UTC)
@Ghuron: I'm not sure that I would read too much into that comparison, unless that count is literally all you want to do, in which case you can calculate a count excluding a particular subset efficiently simply by subtraction.
The first query is fast because the query engine never has to materialise the items, it can just count the difference between two index positions.
As soon as you are wanting anything more concrete (typically involving a more restricted solution set), the difference in time between the two queries would be a lot smaller. Jheald (talk) 20:49, 11 July 2018 (UTC)
Also worth noting that this query is not particularly happy either. Jheald (talk) 20:56, 11 July 2018 (UTC)
Fair point about count, let's assume I want to get labels for all categories w/o P4224 (for machine learning experiment). No single query can return 4M records in 60 seconds, so I'm using LIMIT/OFFSET. Let's see how well each of discussed schemas fits here:
One might argue that my task is rare, but even if we accept this, I still failed to see any reasons against using instance of (P31)  metacategory in Wikimedia projects (Q30432511) except pure aesthetical (that are very subjective) --Ghuron (talk) 11:45, 12 July 2018 (UTC)
@Ghuron: It's not a great surprise that putting a LIMIT 50000 on a COUNT query with a one-line answer fails to be particularly effective :-)
Starting with this (or its OPTIONAL { ... } FILTER (!bound(...)) alternative) might be more interesting. Jheald (talk) 13:03, 12 July 2018 (UTC)
@Jheald: yep, too may query windows open, but still getting timeout on OFFSET 1000000 LIMIT 1. Couldn't figure out how to use OPTIONAL { ... } FILTER (!bound(...)) here because (unlike P4224) we expect several values on P971, and having P3831 qualifier on ANY of them should eliminate category from output --Ghuron (talk) 13:09, 12 July 2018 (UTC)
@Ghuron: I would do it this way: generate a controlled number of categories first, then start applying tests to them. Jheald (talk) 13:26, 12 July 2018 (UTC)
On a tranche of 500,000 categories, the hash join to exclude p:P971/pq:P3831 is adding about 8 seconds (17 seconds vs 9 seconds). Jheald (talk) 13:31, 12 July 2018 (UTC)
This variant is a bit slower, at 29 seconds. Jheald (talk) 13:36, 12 July 2018 (UTC)
@Jheald: as long as it fits 60s timeout, a few seconds slower doesn't matter. Your approach works for both P3831 qualifier and by country (Q19360703)/by city (Q18683478)/by genre (Q42903116)/etc (see [2]). But my understanding was that if something can be expressed without qualifiers, it should be expressed so. Shouldn't we use meta category criterion (Q24571886) children? --Ghuron (talk) 13:47, 12 July 2018 (UTC)
@Ghuron: I would delete that entire class, because I think it just causes difficulty and confusion -- starting with the name, which is deeply opaque. IMO, if something is being used as the partitioning class, it is better to use the regular class-item for that, with a qualifier to say that that is its role, rather to expect people to create and consistently use parallel different items. Jheald (talk) 14:13, 12 July 2018 (UTC)
@Jheald, Jura1, Epìdosis: So apparently we have 3 or 4 competing proposals here (not counting what was discussed in Wikidata:Project_chat#Category:A_[split_up_by]_B). Since any of them will work for me, I don't really care which one will be selected, but I do want us to select one (so I can start using it). Please advise how should we proceed from here --Ghuron (talk) 17:12, 12 July 2018 (UTC)

metacategory can not be strict class on wikis. Its only intention to be so. If metacategory contains pages it is not an error, it is just warning, because it can be happened on much reasons, e.g. wiki does not have sutable subcategory for that metacategory. Either dnot have yet or have not much usability of such subcategory name, or for now only.--Avatar6 (talk) 13:13, 30 July 2018 (UTC)

Triple and double information[edit]

Hi all! I want to restart the previous discussion from another point of view: redundant information. These are the cases:

In your opinion how should we deal with these? Thank you, --Epìdosis 19:52, 29 July 2018 (UTC)

  • I think P4424 was created knowing that it would duplicate P971. It's just meant to provide some different functionality. It may or may not be useful for stub categories. Maybe one shouldn't fill P4424 unless actually needed.
    --- Jura 05:39, 30 July 2018 (UTC)
  • I do not see huge issues with duplicated information between P971 and P4224. Of cause, once data model for categories would be agreed and accepted, it will be more elegant comparing to examples above. I prefer to work with P4224, Jura - with P971, so examples above merely represent "work in progress" --Ghuron (talk) 06:40, 30 July 2018 (UTC)

Template categories - find a solution[edit]

Hi all! I write again to try to solve the problem of template categories: my proposal is

  instance of (P31) category combines topics (P971) + category contains (P4224)
General purposes Wikimedia templates category (Q23894233) Wikimedia template (Q11266439)
Navboxes Wikimedia templates category (Q23894233) Wikimedia navigational template (Q11753321)
Infoboxes Wikimedia templates category (Q23894233) Wikimedia infobox template (Q19887878)
Userboxes Wikimedia templates category (Q23894233) Wikimedia userbox template (Q20769160)
Babel Wikimedia templates category (Q23894233) Wikimedia user language template (Q19842659)

And, in my opinion, Wikimedia navboxes category (Q13331174) and Wikimedia infobox templates category (Q23894246) should be deleted. Here you can see how many times they are used. What's your opinion? --Epìdosis 08:30, 25 August 2018 (UTC)

You are right that Wikimedia navboxes category (Q13331174) & Wikimedia infobox templates category (Q23894246) seems to be redundant. But they do preserve category classes in parallel with category contains (P4224) & category combines topics (P971) and were created and is used for now in template uk:Шаблон:Категорія шаблонів to classify template categories by "pseudo category-namespaces". And, besides, they can be used to simply descriptions that differs and more exact from overused description "Wikimedia category".--Avatar6 (talk) 06:41, 28 August 2018 (UTC)

Wikidata property usage tracking categories[edit]

Hi all! How would you describe a Wikidata property usage tracking category (Q24514938)? At the moment two different systems are used:

  1. instance of (P31)  Wikimedia administration category (Q15647814) + category combines topics (P971)  Wikidata property usage tracking category (Q24514938) (around 2700 cases)
  2. instance of (P31)  Wikidata property usage tracking category (Q24514938) (around 50 cases)

I'm not sure about which is the best: what's your opinion? --Epìdosis 08:40, 25 August 2018 (UTC)

As for me 2nd way is simpler and more exact. Sole P31 is always better i think. P971 should contain pid and case of tracking (same as WD, differs from WD, noWD...).--Avatar6 (talk) 06:48, 28 August 2018 (UTC)
In the second case category combines topics (P971)  value from Wikidata (Q40218570) (and the other 3 similar) should migrate to category contains (P4224)  value from Wikidata (Q40218570) and category combines topics (P971)  Wikidata property usage tracking category (Q24514938) should obviously be removed, is it correct @Avatar6, Jura1:? --Epìdosis 09:49, 31 August 2018 (UTC)

I support this change, as it makes things easier:

MisterSynergy (talk) 19:51, 31 August 2018 (UTC)

Thank you. If there is no opposition, I will do it on the 7th of September. --Epìdosis 09:46, 1 September 2018 (UTC)
  • I don't really care what you do with P4224, but it seems that the proposal doesn't meet the English description of the property: "category contains elements that are instances of this item". As for P971, please don't remove the value from there.
    --- Jura 10:14, 1 September 2018 (UTC)

Metacategorization of categories[edit]

For serial categories by date it realy helps. Pls see the idea at Category:Populated places established in 2018 (Q48328925), Category:Populated places established in 2017 (Q28605372), Category:Populated places established in 2016 (Q24089347), Category:Populated places established in the 2010s (Q7476468).--Avatar6 (talk) 06:16, 28 August 2018 (UTC)

@Avatar6: The use of subclass of (P279) for category items is deprecated, so it should be removed; regarding the other statements in the items, I support them. --Epìdosis 10:23, 28 August 2018 (UTC)
Removing is as simple and quick as adding do not. Any replacement? The use of catalog (P972) for same porpuses is also prohibited as I see. So how we can seek e. g. decade category for year category ? or (meta)categories fo geo-regular ones?--Avatar6 (talk) 10:54, 28 August 2018 (UTC)
  • I think the P971 statements aren't optimal. Something like "settlement", "by inception", "2018" would be preferable.
    --- Jura 10:30, 28 August 2018 (UTC)
    Why "settlement" if named "populated places" which is in P4224? "by inception" - is a METAcategory criterium, but this ones are not a metacategories.--Avatar6 (talk)
    • yes, "inception" is actually better, but "populated places" can be in there. P4224 is unrelated. I don't see why "calendar" would be there though.
      --- Jura 07:39, 30 August 2018 (UTC)