Wikidata:Requests for comment/Do we want automatic inverse statement creation and if so, how should they happen?

From Wikidata
Jump to navigation Jump to search
An editor has requested the community to provide input on "Do we want automatic inverse statement creation and if so, how should they happen?" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.

If you have an opinion regarding this issue, feel free to comment below. Thank you!

At the community Wishlist there was a thread about the automatic adding of inverse statements.

After I raised the issue on the Project Chat, Lydia wrote in the Project Chat:

  • For inverse properties I looked over the discussion and I am seeing a lot of question marks still when it comes to the details like: It seems clear we don't want the automated adding of inverse properties to always happen. But when do we want it to happen? Does it only depend on the property? How about ranks and changing ranks? References? I think it'd be super valuable for someone to set up a bot to do this for a while so we can figure out all these with real data.

I understand the concern that the requirements are at the moment unclear and I'm not clear either what would be ideal. The idea of starting with a bot that's maybe limited to a few properties sounds promising to me. What do you think about it? Does anybody have a clear idea of how the semantics should work? ChristianKl16:36, 30 December 2018 (UTC)[reply]

  • I wonder if there would be technical changes that could be made so that inverse properties aren't needed at all. E.g., by enhancing the API so that finding an item on the right-hand side of statements is as easy as when it's on the left. Keeping duplicate data in sync will always be difficult: what if you delete an incorrect value or mark it as deprecated, would a bot be able to fix the inverse? Ghouston (talk) 03:31, 31 December 2018 (UTC)[reply]
  • My bot DeltaBot is automatically adding inverse statements for 30 property pairs. I experienced several problems so far:
    • many inverse properties are not fully inverse, e.g. contains the administrative territorial entity (P150)/located in the administrative territorial entity (P131), father (P22)/child (P40) etc. For these cases additional conditions have to be defined to trigger a bot action.
    • dealing with qualifiers is very confusing. In some cases qualifiers should be copied to the inverse statement, in other cases the qualifier should be changed or dropped. DeltaBot is not adding any qualifiers but leaves this work to humans.
    • How can a bot know if a statement is correct and thus its inverse should be added? On my talk page I receive from time to time comments that my bot added a mistake although my bot was only propagating the mistake. I solved this issue partially by never letting the bot a statement twice.
    • What to do if a bot edit is undone? We don't know if it was vandalism and thus the original statement should be kept or if it was an erroneous statement and thus the original statement should be undone as well. I've decided that the bot does not remove statements added by human users.
In general, I think, we will always need human user who maintain inverse properties. Bot can take over some work but many decisions have to be made by humans. Thus just as Ghouston I favor technical changes such that inverse properties aren't needed at all. People who use SPARQL queries to access our data do not need inverse properties already now. For the Wikidata user interface I wrote a script to make inverse properties obsolete. My plan is to improve this script over the next weeks and to turn it into a gadget. What is missing is a solution for Wikipedia. I hope the development team will make it in future possible to either access backlinks or SPARQL queries in templates. --Pasleim (talk) 11:27, 31 December 2018 (UTC)[reply]
  • @Pasleim: It sounds to me like it would be good to have new properties to express that father/child should be done based on looking at sex or gender (P21) and also look at which qualifier should be copied. What do you think about that? ChristianKl13:40, 31 December 2018 (UTC)[reply]
  • Another way would be to add a new qualifier that a user can put on a property to mark that it should be copied in full with it's others qualifiers. That qualifer can then also be used to decide that if one property is deleted the inverse will be as well. ChristianKl13:42, 31 December 2018 (UTC)[reply]
  • I agree with @Pasleim and @Ghouston in that "absolute" inverse properties shouldn't even exist, these properties should be bi-directional. They have a label for the forward reference, and a label for the backward reference. Properties that are not fully inverse are trickier, but maybe could work in a similar way by marking each statement as bi-directional or not. Those are big technical changes, so in the short term, enforcing those to-be-converted properties with a bot could be useful as an immediate solution and a preparation for the migration effort as well. Villasv (talk) 11:50, 2 January 2019 (UTC)[reply]
  • I also think the automatic reverse property idea suggested by Ghouston would be the best solution to this problem. Take the MusicBrainz database for example. If I add a relationship to an artist or release, its inverse relationship is automatically shown on the other related object. If I then remove the relationship from the related object, it disappears from the original. They manage relationships as distinct and separate entities from the objects that they are mapping a relation for - rather than as properties of said objects - which makes this type of work easier to manage. Implementing a similar paradigm for statements on Wikidata might take a lot of work for the developers though, but I think it'd be well worth it in the end. — AfroThundr (u · t · c) 00:17, 2 January 2019 (UTC)[reply]
    • I agree with this view – it's not useful to handle inverses by automatically adding all matching pairs, and it unnecessarily duplicates data. @AfroThundr3007730, Ghouston, ChristianKl, Pasleim, Villasv, Bovlb: I think phab:T209559 is related, although it mainly discusses the problem and suggests multiple solutions. Jc86035 (talk) 10:28, 22 January 2019 (UTC)[reply]
      • The inverse statements already exist by implication. If A has property P with value B, then you can also write it as B having property P-1 A. E.g., person A has a child B, then person B has a parent A, since having a child and having a parent are two ways of looking at the same relationship. It's "simply" a matter of software tools making use of these inverses. Sometimes we do have properties that seem like inverses, like mother and child, but really they aren't. The inverse of mother would be "mother of", or something. Ghouston (talk) 11:17, 22 January 2019 (UTC)[reply]
  • minor  Comment can we please retitle this RfC to reflect that it’s about creating inverse statements, not properties? property creation is what property creators do, so the automatic inverse property creation in the current title is confusing/misleading. —Galaktos (talk) 23:18, 1 January 2019 (UTC)[reply]
  • It's tricky. In some cases, the inverses follow logically. If A is employed by B, then B is an employee of A. You'd only need one property and a label for its inverse. In other cases, there aren't true inverses, e.g., if class A is part of class B, B doesn't necessarily have part A (since B may be a whole range of things, some of which don't have A). One issue with the API is that I don't think it has a way of handling items that appear in a huge number of statements, e.g., if we were asking for human (Q5) on the right-hand-side of any statement. Ghouston (talk) 00:45, 2 January 2019 (UTC)[reply]
Issues:
  • We want to avoid data falling out-of-sync. If someone makes an edit to a statement with an inverse property, the corresponding statement should not need a separate manual update. Additions, removals, changes, qualifiers, references, etc, should all stay in sync.
  • We want client wikis and tools to be able to access statements in both directions.
  • Page histories should not contain confusing elements. An edit done by a user should be associated with that user, not by a random bot.
  • There are some non-bidirectionally-inverse properties.
I have yet to see any solutions that solve for all of these. --Yair rand (talk) 01:35, 2 January 2019 (UTC)[reply]
  • Perhaps a change to the UI so when the constraint notice/warning (whatever the term is) is shown, putting an "add it" button as part of that notice that when clicked adds the inverse property with 1 click would be a great starting point.Nickw25 (talk) 10:16, 2 January 2019 (UTC)[reply]
    • That was my thought, too. Doing it automatically can be difficult since there might be many possible exceptions to the rule. But by having a button to create the inverse statement, we'd leave the decision up to the user. We already have User:Matěj Suchánek/moveClaim.js, a tool that provides a little button next to statements that lets you move or copy the statement to another item. Maybe something similar could be created to create the inverse statement on the other item? Not sure if such tools are able to look up the inverse statements on the properties for themselves or if they need a curated list of properties and their inverse properties as part of the tool's code. --Kam Solusar (talk) 11:39, 2 January 2019 (UTC)[reply]
Pasleim's Deltabot experience discussed above is pretty instructive. To be concise, in my view (A) No, we do not want automatic inverse statements created, but (B) there should be some easy way via all UI's and API's to have access to any statement of which the item in question is the object, rather than the subject, so that inverses are really not needed anyway. ArthurPSmith (talk) 16:40, 2 January 2019 (UTC)[reply]
Some inverse properties should be added in 99% of cases (child (P40)/father (P22), spouse (P26)/spouse (P26), category's main topic (P301)/topic's main category (P910), opposite of (P461)/opposite of (P461), follows (P155)/Template:P/156...), sometimes with qualifier
Some are more problematic (e.g. contains the administrative territorial entity (P150)/located in the administrative territorial entity (P131) should be applied in both directions only to territorial entities with exceptions)
For the first group there exists User:Frettie/consistency_check_add.js. Simple, but not comfortable, only with codes.
For the second group some tool should be fine, but only with manual confirmation.
JAn Dudík (talk) 10:34, 3 January 2019 (UTC)[reply]
 Comment This is definitely the scope of WikiProject Reasoning. @Markus Krötzsch: is probably especially interested. author  TomT0m / talk page 14:56, 3 January 2019 (UTC)[reply]
My view is that we should not have explicit inverse properties. It confuses users, leads to inconsistencies, and cannot be satisfactorily automated. Instead we should support implicit inverse properties by making it easier to browse claims for which the focal item is the object. Bovlb (talk) 17:48, 7 January 2019 (UTC)[reply]

Properties to specifiy how a bot should handle a particular property[edit]

  • @Pasleim:It sounds to me like it the information about child/father/mother could be expressed via new properties on the property page. We could also have a new property to specify which qualifiers should be copied by bot. We could have another property to specify which inverse should be automatically created via the bot.
For child/father/mother we need at least two properties. One property to say that sex or gender (P21) is the relevant discrimination property. And a second one to say that if the object has sex or gender (P21)=female (Q6581072) then mother (P25), and if sex or gender (P21)=male (Q6581097) then father (P22). For other property pairs a third property is probably needed. For example in located in the administrative territorial entity (P131)/contains the administrative territorial entity (P150) one need to look at the instance of (P31) value and check if it is administrative territorial entity (Q56061) or a subclass thereof. Another solution is to specify SPARQL queries which return all possible candiates. To specify which inverse statements should automatically be created via the bot, we could add instance of (P31)="property with automatic inverse completion". --Pasleim (talk) 14:03, 3 January 2019 (UTC)[reply]

Improving usability, visibility and features of inverse statements[edit]

From the user perspective, the most immediate way to see which items are connected to any item is by using Special:WhatLinksHere. However this page is not well suited to Wikidata needs because it only shows that there is a link, not what kind of link, or if there are any inverse statements. I think this page could be improved to display better the relationships between items like MusicBrainz does, and to manage the connections. For instance some features that could be applied to this page:

  • Group incoming links by property and allow to filter per property
  • Show more information about the incoming statement
  • The incoming link should point to the statement instead of to the item
  • Highlight statements connected by an inverse property
  • Allow to add the statements with a click to the item that receives the incoming link
  • Give options to flag statements to be synched
  • Make the access to Special:WhatLinksHere more prominent in the user interface

Statement synching is tricky because sometimes we want them to be independent, but if we could specify which statements we want to keep synched (and what kind of synch) then a lot of work could be saved. I don't know if that page would be the best place to store information about which statement to keep synched and how, but as it is now, there is no space for this kind of information.--Micru (talk) 12:25, 4 January 2019 (UTC)[reply]

There's https://tools.wmflabs.org/sqid/#/view?id=Q1339 that allows a user currently a better view then Special:WhatLinksHere. ChristianKl17:40, 4 January 2019 (UTC)[reply]
For such external tools and potential internal solution it would be good to have a place to store the inverse label of a property, for example "superclass of" for subclass of (P279). --Pasleim (talk) 18:00, 4 January 2019 (UTC)[reply]
@Pasleim: That is a good idea! Could you please file it on wd:pp.--Micru (talk) 16:50, 5 January 2019 (UTC)[reply]
The question is how to specify the "virtual property". How about creating a new datatype of "virtual", which means you can't manually add the property anywhere? ChristianKl19:41, 5 January 2019 (UTC)[reply]
Virtual datatype is an interesting idea and could be used for other calculated properties too, not just inverse properties. The drawback is that it often takes a year or more till completely new features are developed. As a (temporary) alternative with the tools at hand, we could propose a property with monolingual datatype to store the inverse labels on the original property. --Pasleim (talk) 17:22, 8 January 2019 (UTC)[reply]
@Lydia_Pintscher_(WMDE): Can you comment on the amount of effort it would be to have such a new property type added? ChristianKl11:23, 12 January 2019 (UTC)[reply]
I'm not sure I understand the idea. We have been floating the idea of virtual statements. I have a short description here. Is this what you mean? Or something else? --Lydia Pintscher (WMDE) (talk) 17:12, 12 January 2019 (UTC)[reply]
@Lydia Pintscher (WMDE): I don't know what EXIF is. As a starting point, we would only need a new namespace that's analogous to the existing property namespace. Entries in that new namespace shouldn't be able used as properties but be able to be objects or subjects in cases where properties can currently be subjects and objects. I should be able to say SUBCLASS_OF HAS_INVERSE_PROPERTY SUPERCLASS_OF. I shouldn't be able to say GENDER SUPERCLASS_OF MALE. ChristianKl18:47, 12 January 2019 (UTC)[reply]
See Exif. It's basically embedded meta data in image files.
As for the new namespace and entity type: That's probably not super complicated. Where it gets potentially very complicated is when it comes to what should then happen based on what's in that namespace. --Lydia Pintscher (WMDE) (talk) 12:53, 27 January 2019 (UTC)[reply]
@Lydia Pintscher (WMDE): It would be possible to have the namespace and leave "what should happen" to third-party development for one/two years and then get a better understand of what should be implemented by the main team. ChristianKl14:32, 27 January 2019 (UTC)[reply]
@Lydia Pintscher (WMDE): I like the idea of virtual statements (for inverse, inferred, or calculated values), however there should be a space where the behaviour can be defined. I imagine that a new entity type for the rules that generate the virtual statements could be useful, since it could follow the same system as constraints. Btw, what do you think about improving Special:WhatLinksHere?--Micru (talk) 09:49, 15 January 2019 (UTC)[reply]
Not opposed to improving whatlinkshere at all. What would you change? --Lydia Pintscher (WMDE) (talk) 12:53, 27 January 2019 (UTC)[reply]
@Lydia Pintscher (WMDE): There are many ways to improve it. The most immediate way that I can think of is to group the links by the property that links to the item. With a bit more of effort, then it would be nice to show only which items link through a certain property. And even better could be to indicate somehow which inverse statements are present in other items that are also present in the current item.--Micru (talk) 17:50, 27 January 2019 (UTC)[reply]
@Lydia Pintscher (WMDE): Special:WhatLinksHere is a generic MediaWiki feature: I'm not sure that it's the best place to display this information. I think the best place would be the item pages themselves: display the statements where the item appears on the right just like they are displayed for the left. There's also another feature that's needed to make this practical: when there are a lot of statements of the same kind, display only a subset plus a link for displaying the rest (perhaps as a paged list). Pages like human (Q5) need to be still usable even though they appear in an enormous number of statements. There are already items that appear on the left side of an enormous number of statements, e.g., Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC (Q21481859) with its enormous number of authors: it would help there too. Ghouston (talk) 01:04, 13 February 2019 (UTC)[reply]
Listing the "first few" statements is a bit arbitrary. Perhaps there could be a preference for the maximum number of statements of the same kind to be displayed, with a default of 50 or so. If the number was greater than that, just put a link with something like "There are more than N statements of 'has instance', click here to see them". The inverse label would be taken from a new property, or use something like "inverse P999" if not set. The API would need a similar mechanism. Ghouston (talk) 01:28, 13 February 2019 (UTC)[reply]
(Really, there should probably be a maximum number of statements of any given kind to displayed for normal statements too.) --Yair rand (talk) 06:26, 13 February 2019 (UTC)[reply]
@Micru, ChristianKl, Lydia Pintscher (WMDE): I think improving Special:WhatLinksHere would be useful, but it wouldn't resolve the issue of inverse statements being inaccessible in MediaWiki. ("Inverse isn't necessary" is also the sole reason for two of my property proposals being rejected.) If it's only useful internally to Wikidata then it doesn't have to be in the actual Wikidata/Wikibase source code anyway. User:Pasleim/derivedstatements.js can already display items' inverse statements, although it omits qualifiers and references. Jc86035 (talk) 17:10, 27 January 2019 (UTC)[reply]
+1. People don't understand that Wikidata is a database and that data are not presented or formatted for the user like in Wikipedia. Data have to be retrieved or queried, not read. Instead of creating some special features for inverse properties, we have to think about inference. Inverse properties are only a special case of inference.
Instead of focusing on a special tool allowing to extract a specific set of data, better think about an external tool offering the possibility to generate data from WD.
For example to improve the classification instance/subclass, it would be great to generate the full tree of classification using instance/subclass (if A is an instance of B and B is a subclass of C, we should be able to generate the information that A is an instance of C).
So the critical question is not to know which tool or link have to be used, but which data model ensure the query of data in a similar and independently of the subject fields.
Just an example: if we assume that father and child properties are redundant, which one has to be kept ? The choice has to be similar for part of/ has part or other pair of inverse properties. My first proposition would be to define a kind of rule "bottom-up". Snipre (talk) 13:12, 12 February 2019 (UTC)[reply]

Where to store the indications for automated actions?[edit]

From what I have read here, it is difficult to know when to add an inverse statement, what kind of behaviour should the bot follow, and how to make users aware that the statement has been automated. I can imagine a property "adds inverse value" with possible options "overwrite value changes", "keep connected", "only write once", etc... and perhaps the counterpart "inverse value added from". However I don't know if it would be a qualifier, a source, or something else to be added in a yet to be defined field. Thoughts? --Micru (talk) 17:01, 5 January 2019 (UTC)[reply]