Topic on User talk:JakobVoss

Jump to navigation Jump to search
TomT0m (talkcontribs)

Can you explain the edits like this one ? https://www.wikidata.org/w/index.php?title=Q23056371&diff=prev&oldid=546207667

seems that a french postal code like « 12345 » is a postal code in the sense that « subclass of » mean. Any french postal code is a postal code.

So it seems to me that « french postal code » instance of « postal code » does not make any sense but « subclass of » makes.

Stuff like « french postal code » instance of « national postal code system », one the other hand, would make.

JakobVoss (talkcontribs)

The question when to use instance of (P31) and when to use subclass of (P279) cannot be answered by ontological arguments alone but it depends on the current state of Wikidata as a whole. As long as Wikidata does not contain an item for each individual French postal (such « 12345 ») the ontological class of French postal codes will never have instances. Instead of having an empty Wikidata class, it is more useful to model each individual system of postal codes (such as Postal codes in France (Q1105640)) as Wikidata instance of the Wikidata class of all postal codes systems.

TomT0m (talkcontribs)

Building an ontological mess is building a mess. Just create that damn class, this is way more simple that having to bsk yourself each time « is there instances on Wikidata » ? Which is a non trivial question and can change over time. If that happens you’ll change all the statements ? Better being right the first time.

JakobVoss (talkcontribs)

There is no "right" in Wikidata, it's not about facts but about statements. Furthermore reality is a non trivial and can change over time, so does Wikidata.

JakobVoss (talkcontribs)

Maybe Wikidata:Identifiers helps to explain. The question what an identifier actually is is not obvious. I'd like to stress that there is no identifier without an identifier systems (unlike names, which can also stand alone). It would be very complicated to have Wikidata items for both "System of French postal codes" and "French postal code character strings", so we better take same as one item.

Ogoorcs (talkcontribs)

I don't think Wikidata could ever contain items about instances of postal codes, at least until its well-posed formulas consist of triple subject property object:

suppose Wikidata stores items about instances of postal codes; then why couldn't it contain items for instances of any UID? Why couldn't it contain items for instances of Wikidata item identifiers (i.e. Q123)?

Of course we could do this, but soon we would start encountering many recursion problems that Wikidata language can't even express.

Anyway, elements (instances) of the class of postal codes are postal codes, Postal codes in France (Q1105640) is the set (class) of postal codes issued by France, thus it is a subset of postal codes and so we should use subclass of (P279) and not instance of (P31).

Nevertheless I see your point here: how could we retrieve french postal codes if this class is empty by Wikidata means?

I think this is a greater problem; I think we need tools to populate the class without creating new elements, maybe connecting the answer of the query subclass of french postal codes to that of the query postal codes of town located in France. Tools like that could also solve the problem of expressing the same concepts through different properties. I would support a proposal in this direction.

PS: I think you should define ontological class.

Ogoorcs (talkcontribs)

>Furthermore reality is a non trivial and can change over time, so does Wikidata.

Wikidata can't express all of reality because its language is not even a first order language and it does not admit recursion, so on Wikidata you can't define a lot of mathematical concepts.

You can feel the limitations with this definition:

In ZF theory, an ordered n-uple is an unordered couple (a set composed of two elements) consisting of the ordered (n-1)-uple and the n-th element .

Try to express this definition in tuple (Q600590) without using recursion.

D1gggg (talkcontribs)
JakobVoss (talkcontribs)

How do we differentiate (if we actually need to, maybe not!)

  • classes of concrete identifier systems (postal codes, identifiers for people...)
  • classes of identifier systems (e.g. postal code identifier systems, identifier systems for people...)

My problem is I see no way to define "identifier" without "identifier system" because identifiers are always part of (sic!) an identifier system.

D1gggg (talkcontribs)
Nikki (talkcontribs)

If a Wikimedia project decides to make a page for a particular postcode, we would be obliged to have an item for it. Even if we expect (and would prefer) no individual instances, we should still model the data in a way which allows for instances. For example, we already have 10048 (Q4546087) (which a query for instances of subclasses of postal code (the usual way to find individual instances) doesn't find).

It seems to me that you're using "postal code" to mean "a postal code system" whereas the other people here understand it as "an individual postal code". Based on the English description and English Wikipedia page for postal code (Q37447), I would also interpret it as meaning an individual postal code.

Perhaps a solution that would work for everyone would be to have a new item for "postal code system"? Then things like ZIP code (Q136208) could be an instance of a postal code system (ZIP codes are a specific system) and a subclass of postal code (all ZIP codes are postal codes).

JakobVoss (talkcontribs)

Dividing "postal code system" and "individual post code" would be impractical nitpicking. Neither Wikipedia articles nor normal language make such distinction. There is no right way to model things in Wikidata but several possibilities. Good solutions must be judged on how well they provide data reuse (e.g. queries) and how well they can be applied in practice (so not too difficult to understand).

Thanks for giving the example 10048 (Q4546087)! These individual identifiers are an exception but they exist. I'd like to be able to answer queries like the following:

Individual identifiers can exist as values (e.g. values of property IATA airport code (P238)) and - less so - as items (e.g. 10048 (Q4546087)). Concrete identifier systems make the most of identifier-related Wikidata items and types of identifiers should only be used to organize the former or of Wikipedia articles about general types of identifiers exist.

What do you suggest to separate these three cases?

D1gggg (talkcontribs)
JakobVoss (talkcontribs)
D1gggg (talkcontribs)
D1gggg (talkcontribs)

ZIP code potentially can be separated into several items (old format) and (new format)


But this is preemptive for many postcodes as historic data is more complex question than current solutions.

JakobVoss (talkcontribs)

Some degree of fuzzyness cannot be avoided because concepts change over time and have slightly different meanings in different contexts and Wikipedia editions. To summarize the example, we have three kinds of items plus the most common superclass of all identifiers:

How to query all individual identifiers?

?individualID wdt:P31 ?idSystem

How to query all postal code systems?

?idSystem wdt:P279 wd:Q37447 ; wdt:P31 "identifier system"

How to query all identifier systems?

?idSystem wdt:P279* wd:6545185 ; wdt:P31 "identifier system"

How to query all types of identifiers?

?idType wdt:P279* wd:6545185 FILTER NOT EXISTS { ?idSystem wdt:P31 "identifier system" }

I find this solution more difficult to apply and to make use of. As far as I now, the FILTER NOT EXISTS clause cannot be used in Lua Templates. With my current approach this is easier:

How to query all individual identifiers?

?individualID wdt:P361 ?idSystem (part-of)

How to query all postal code systems?

?idSystem wdt:P31 wd:Q37447 (instance)

How to query all identifier systems?

?idSystem wdt:P31/wd:P279* wd:6545185

How to query all types of identifiers?

?idType wdt:P279* wd:6545185

<nowiki>Maybe adding another property can help? Checking for the (non-)existence of two statements is too fragile to get reliable results. We could create a sub-property of P31 or P279 to use, this is also applied for taxon data (taxon rank (P105), parent taxon (P171)...).

tl;dr: we cannot have both of the follwing statements, one must be changed or there is no easy way to tell that ZIP code (Q136208) is a concrete identifier system instead of a general class of multiple systems with possibly overlapping or identifier-values:

D1gggg (talkcontribs)
Reply to "don’t understand those edits"