Wikidata talk:Living persons (draft)

From Wikidata
Jump to: navigation, search

Proposal[edit]

I think something like this is necessary but it needs to have a little bit of teeth and at the very least live up to the resolution of the board of the Wikimedia Foundation. Asking that the statements be "supported by information in at least one corresponding Wikipedia article, that (in the article) has a citation to a reliable[note 1] source" is not enough because there's no reason to think that the source (even if we identify it as reliable) has anything to do with the statement being added to Wikidata. If a statement is possibly controversial or privacy-invading, then it should not be added without a direct reliable source. In other words, it can only be imported from Wikipedia if we can also import the reference for that claim. Pichpich (talk) 02:37, 15 June 2013 (UTC)

When writing this, I was hoping for Wikidata to be able to use its own sourcing system. However, I'm not sure that the system is ready yet, and thus put in the Wikipedia sourcing as a temporary measure.--Jasper Deng (talk) 03:18, 15 June 2013 (UTC)
I guess I have a stricter view on this. We don't have sourcing, ok, let's work on that. But until we have sourcing, we don't add potentially controversial statements about living people, period. It's not a huge problem: there are plenty of other statements that we can add and source (even under the current limitations) and, as far as I understand, tools for sourcing will be available in the near future. Pichpich (talk) 03:09, 16 June 2013 (UTC)
Currently this policy makes no distinction between potentially controversial statements and statements that aren't. ChristianKl (talk) 08:15, 9 July 2016 (UTC)

Definition of Living person[edit]

What's a living person for the sake of this policy? Could you define the term? ChristianKl (talk) 08:15, 9 July 2016 (UTC)

It is any (particular) human who is currently physically alive - essentially, whatever definition the foundation intended in its resolution.--Jasper Deng (talk) 08:24, 9 July 2016 (UTC)
That would mean that there's no information about who is a living person in Wikidata and no automated tool could police living people. ChristianKl (talk) 07:30, 23 January 2017 (UTC)
@ChristianKl: An automated tool could do it if "no value for death date" was taken to mean "living" (as opposed to "unknown (or empty) value for death date"). Basically, every human should have a death date statement. Sam Wilson 07:57, 23 January 2017 (UTC)
Saying "everybody who has no death date is living" is a quite different definition than the one that Jasper Deng proposed. When writing policy it's worth to be clear about what the operational definitions happen to be. ChristianKl (talk) 14:49, 23 January 2017 (UTC)

Must and should[edit]

Given definitions of Must and Should as layed out in RfC2119, is the word must really the right one for this policy? Especially when it comes to information in talk pages.ChristianKl (talk) 08:21, 9 July 2016 (UTC)

The must is indeed intended in the sense of the IETF sense, because of the importance of verifiability for living persons' information, as stated in the foundation resolution.--Jasper Deng (talk) 08:24, 9 July 2016 (UTC)
This means that if an lobbyist edits a Wikidata entry you in generel intent to forbid any discussion of the fact that the user account is used by a lobbyist on Wikidata?
Furthermore the Foundation resolution doesn't state that information must be verifiable or even that it should in the RfC2119 sense.ChristianKl (talk) 09:04, 9 July 2016 (UTC)

Following up a sentence that contains `must` with one that contains `especially` also makes no sense under the RfC2119 meaning. ChristianKl (talk) 09:20, 9 July 2016 (UTC)

Document how to record living people?[edit]

I might have just overlooked it elsewhere, but is it worth adding to this page the fact that for living people date of death (P570) should have a value of 'no value'? Or at least linking to the documentation for that? —Sam Wilson 05:35, 23 January 2017 (UTC)

The solution is now, that it doesn't have to be explicitely stated via 'no value' but we have a different standards as seen on the page. ChristianKl () 12:46, 6 December 2017 (UTC)

Privacy[edit]

This page only discuss about verifiability, but not privacy. We should have something like en:WP:BLPPRIVACY and en:WP:BLPPRIMARY, otherwise we may get private information like home address, personal phone number or even identity card number added into Wikidata (the first two have related Wikidata properties which may be misused).--GZWDer (talk) 16:36, 23 January 2017 (UTC)

Additions[edit]

I added a definition section and sections on controversial statements and privacy concerns, based on the above discussion and discussion at Project Chat. Further edits welcome. I am wondering if "blood type" really should be in the controversial category? ArthurPSmith (talk) 15:53, 3 April 2017 (UTC)

That's medical information and in most countries very strictly protected by privacy laws. Why in the world is there a blood type field for people in any case? so wierd. Jytdog (talk) 09:54, 9 April 2017 (UTC)
In Japan the blood type of people has cultural significance. There are folk beliefs about how personality correlates with the blood type. As a result the blood type for celebrities is often listed in Japenese pop-biographies and the Japanese Wikipedia frequently includes information about the blood type. It would be worth to check how well referenced the Japanese Wikipedia is in this regard to decide whether we should require sources. ChristianKl (talk) 08:29, 11 April 2017 (UTC)
wow. crazy. the stuff we learn while doing this work! but oy mixing up celebrity gossip with medical information is a mess. Jytdog (talk) 19:40, 11 April 2017 (UTC)

Special bot policy?[edit]

On project chat the issue of special scrutiny for bots editing items about living people was raised - so should there be a special review procedure documented here for this case? ArthurPSmith (talk) 15:55, 5 April 2017 (UTC)

in my view, heck yes! Jytdog (talk) 09:54, 9 April 2017 (UTC)
This reminds me of the incident of adding the CHEMBL data about illness drug associations to drug used for treatment (P2176). It might be worthwhile to think more generally about how we review bot activity. ChristianKl (talk) 08:32, 11 April 2017 (UTC)
@Jytdog, ChristianKl: Ok, I added another section specifically on bot approval, is this perhaps suitable? ArthurPSmith (talk) 13:16, 11 April 2017 (UTC)
Could you give an example of a closed database where you want to forbid the data import with this policy? Otherwise I don't see anything objectionable. ChristianKl (talk) 16:13, 11 April 2017 (UTC)
That third bullet is likely to be a deal killer. If you plan to run an RfC to make this policy it might be wise to take it out of the proposal, and instead propose it as a separate item for people to !vote on. Jytdog (talk) 19:44, 11 April 2017 (UTC)
Given that the third point uses should and not must, I don't think the point is likely to be a deal killer. ChristianKl (talk) 11:04, 12 April 2017 (UTC)
ChristianKl - http://www.whitepages.com for example. There are many such services, some of which provide some limited data for free and then charge for more detail on individuals. I don't think any of them should be used, whether from their free data or premium data, as sources for living people info in wikidata. ArthurPSmith (talk) 16:08, 12 April 2017 (UTC)
Okay, I agree that it's sensible to not directly import the information from databases like http://www.whitepages.com . ChristianKl (talk) 17:42, 12 April 2017 (UTC)
"adequate BLP policy"? who decides - Asaf ? ; "community is happy with as a reliable source"? what happens when we have a consensus at a community that does not agree with engish? it seems rather top down, and adversive. why are you not creating a LP upload group to coach data uploders. they can develop their standards of practice. why not survey uploaders and ask for their standards. why dictate? Slowking4 (talk) 21:13, 13 April 2017 (UTC)
We already have a process for approving bots. This proposal is not about having a totally new way to approve bots but just suggests that certain issues should be considered when approving. ChristianKl (talk) 15:04, 14 April 2017 (UTC)
Do you have some examples of problematic edits by bots, that would be trapped by this? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:07, 14 April 2017 (UTC)
@Pigsonthewing: I remember one bot that automatically scraped social network pages to link them to individual people. We already rejected that bot with our present way of forming consensus but I think it's valuable to have an explicit policy about what violates privacy instead of making case by case decisions. I don't expect the bot policy as written to change the current status quo of how we deal with bots in a significant way. ChristianKl (talk) 07:54, 30 May 2017 (UTC)
For BOT we mean also user that use tool like Petscan or QuickStatements? --ValterVB (talk) 15:28, 14 April 2017 (UTC)
I don't think we need to define the term bot for the purposes of this policy page. The decision of what counts as bot should be up to our bot policy. To the extend that policy is currently unclear about when usage of QuickStatements counts as bot, making it clear is a worthwhile project but I don't think it's the project of this policy. ChristianKl (talk) 08:19, 30 May 2017 (UTC)

openly supplied by the individual themselves[edit]

you have a conflict between "unless they can be considered widespread public knowledge or openly supplied by the individual themselves" and "reliable open information sources such as newspapers or other media outlets". i.e.-a twitter disclosure of sexual orientation, or gender identity. do you wait for secondary sources to report, which may be a while for less notable people, or link to a blog or twitter primary source which is less reliable. Slowking4 (talk) 14:08, 16 April 2017 (UTC)

I think the idea is that we don't want to have a bot that simply scrapes all information from social media profile and copies that information into Wikidata. As a result the standards for bot work are a bit stricter. ChristianKl (talk) 16:28, 16 April 2017 (UTC)

Information on non-item pages[edit]

We still need a section about non-item pages. Does anybody have good ideas of how to word it? ChristianKl (talk) 15:57, 18 April 2017 (UTC)

For non-item pages I would guess wikidata is more similar to the language wikipedia's - enwiki's non-article space policy section could be copied with minor changes for instance? ArthurPSmith (talk) 19:28, 18 April 2017 (UTC)
That sounds like a good idea. I copied the section from enwiki and made a few adaptions. Feel free to refine the text further. ChristianKl (talk) 21:35, 18 April 2017 (UTC)
looks good to me, thanks! ArthurPSmith (talk) 19:34, 19 April 2017 (UTC)

Scope[edit]

It seems a bit odd that this page outlines what we can say about living people, but not whether we should include them. Broadly speaking, Wikidata:Notability means that no-one who isn't a public figure to some degree should have a Wikidata item, but some of the people we include are very much on the edge of public, and this does feel a little uncomfortable at times. I wonder if we should have a section here that effectively says "if this person is not a public figure, please consider whether it is appropriate to include them". Andrew Gray (talk) 22:28, 29 May 2017 (UTC)

Currently, Wikidata:Notability says nothing about a person having to be a public figure to be included in Wikidata. It just requires serious and public sources that can be used to describe the person. This policy page is not about changing our current definition of notability. ChristianKl (talk) 10:44, 30 May 2017 (UTC)
Wikidata will naturally have far more items for living people than a wikipedia for structural reasons - many of our properties want items as values, so we often create the items for people who are related to a notable person or other entity, which certainly doesn't mean those people are public figures in themselves. ArthurPSmith (talk) 14:33, 30 May 2017 (UTC)

Information retrieved from Wikipedias[edit]

The information that is included in any of the WMF projects has the same underlying premise; the policy of the foundation. Once a Wikipedia has entered data it must be understood that the data is ok. This is the only way that allows the current practice of populating Wikidata from the Wikipedias. When this is not accepted, current practices are no longer possible and Wikidata will die slowly. Its quality will go down.

There is another side to this, when a Wikidata statement is challenged, it follows that the upstream data is challenged. Now this means that we should seek a closer link with the projects we gain our data from. It means that we accept as good what we share as being the same. As I have argued already all too often, this is where our time on details pays of. Plenty of examples are available but given the sheer amount of data in Wikidata, this policy will destroy Wikidata when it is applied in a Wikipedia ::manner. Thanks, GerardM (talk) 06:08, 11 September 2017 (UTC)

"Once a Wikipedia has entered data it must be understood that the data is ok". Ideally this would be true. It isn't. And promulgating practices based on that decreases data quality. Nikkimaria (talk) 00:40, 12 September 2017 (UTC)
"promulgating practices based on that decreases data quality." = false statement; rather, data quality remains unchanged. but better to have a data quality improvement process here, to then push out to the wikipedias, if some people would allow incorporating data from wikidata. you will not increase quality by slogans or by gatekeeping behavior. Slowking4 (talk) 17:52, 12 September 2017 (UTC)
A basic level of gatekeeping is key to maintaining data quality - not a slogan, just a fact. Nikkimaria (talk) 19:34, 12 September 2017 (UTC)
no, a basic level of quality control is key to maintaining data quality. the perpetual attempt to do so by the failed methods of gatekeeping, or increasing the scrap rate is not based on factual evidence, it is based on an ideology. Slowking4 (talk) 16:07, 14 September 2017 (UTC)
"Once a Wikipedia has entered data it must be understood that the data is ok" is based on an ideology, not on factual evidence. What do you see as the difference between "a basic level of quality control" and "gatekeeping"? Nikkimaria (talk) 02:56, 15 September 2017 (UTC)
yes, your arguments are based on an ideology only, and you have presented no factual evidence. "it must be understood that the data is ok" is a starting point; and "practices based on that decreases data quality" is false; rather all data's quality can be improved. there is no go/no go about data, and you have no rational standard to determine your go/no go. the notion that you only get one bite at the quality apple upon upload or link is false, and not a rational basis to improve quality. clearly you need to go to school on quality. start here: w:W. Edwards Deming. when you can discuss the 14 principles, then we can collaborate, but not before. Slowking4 (talk) 16:45, 15 September 2017 (UTC)
"Eliminate the need for massive inspection by building quality into the product in the first place". Nikkimaria (talk) 18:21, 15 September 2017 (UTC)
"Eliminate slogans, exhortations, and targets for the work force asking for zero defects and new levels of productivity. Such exhortations only create adversarial relationships, as the bulk of the causes of low quality and low productivity belong to the system and thus lie beyond the power of the work force." Slowking4 (talk) 02:13, 16 September 2017 (UTC)
In this particular case, the "system" is indeed within the "power of the work force" - we as a community develop policies, guidelines, and practices that can address causes of low quality and other problems. That's the whole point of this draft, for example - to develop a standard of quality control for a particular type of data. Nikkimaria (talk) 02:54, 16 September 2017 (UTC)
what have you ever done to improve quality in a non-adversarial way? i see no standard, or work to improve a system, but rather a veto with constantly shifting rationale, "no the quality is still not good enough not to delete"; and "if you persist, i will block you." Slowking4 (talk) 09:11, 16 September 2017 (UTC)
Again, the point of this draft and others is to develop a quality standard and improve the system. If you choose to take an adversarial approach there's really nothing I can do about that, other than to say I think this conversation has gone well past the point of usefulness. Nikkimaria (talk) 13:13, 16 September 2017 (UTC)
  • Ok, I blogged about an error in English Wikipedia. It came to light on Wikidata thanks to a quality assessment query. I have not improved on en.wp because of its adversarial stance. I made several proposals that enable better quality assurance practices for all Wikipedias and Wikidata. It is based on the premise that once one project is in disagreement on a statement, a fact there is a need for attention. By concentrating on our differences and not on where our data agrees we focus on quality issues. My point is very much that I regularly find issue with data retrieved from English Wikipedia that comes to light by comparing the data from other Wikipedias.
My question to anyone who wants a BLP practice for Wikidata, how can you insist on the one that is proposed when collaboration improves quality for all our projects? Thanks, GerardM (talk) 14:05, 16 September 2017 (UTC)
I have no strong opinion about the "draft" or not here, since we already have a sort of Global policy. But I see one large flaw in your reason above GerardM. Yes, we discover false statements by the help of cooperation, but it does not stop them from being re-added here. I have removed the same "coat of arm", "sister citys" and "official website" hundreds of times from the same items, but they are constantly re-added here. Wikidata has not changed the behaviour on Wikipedia, only documented its flaws. We need a new better strategy, not one that only redo the same mistakes we have done for more than a decade on Wikipedia. -- Innocent bystander (talk) 16:04, 16 September 2017 (UTC)
That is exactly the problem. We do not cooperate on BLP. Thanks, GerardM (talk) 21:22, 16 September 2017 (UTC)
Or anywhere else! My solution to that problem is that every claim that is imported from an infoboxes at Wikipedia, at the same time have to be removed from those infoboxes. Otherwise there is no cooperation. -- Innocent bystander (talk) 07:53, 17 September 2017 (UTC)
I agree with you Innocent bystander, but this only works where Infoboxes are lua generated from wikidata items… it's the case for most ruwiki person's templates, but some projects simply refuse it, considering that infobox are the sole responsability of the article editors... , and wikidata driven infoboxes are a very controversial issue :( -- see frwiki, for instance. --Hsarrazin (talk) 09:29, 17 September 2017 (UTC)
This is more or less exactly the thing I do on svwiki, not with human (Q5)-items, since it is controversial, but with geographic places. But I prefer to add data from good sources, not from Wikipedia. -- Innocent bystander (talk) 09:49, 17 September 2017 (UTC)

ACM on transparency and accountability[edit]

the ACM has issued a policy on transparency and accountability

the following principles should be included in a BLP policy.

  1. access and redress we should have a simple interface and reaction team to respond to subjects of the data
  2. explanation we should have a simple explanation of where BLP data comes from and possible uses of the data
  3. provenance we should provide a permanent trail of where the data came from
  4. validation and testing we should provide a quality control chart, and test proposed policies about BLP and only implement those that improve data quality.

Slowking4 (talk) 16:39, 14 September 2017 (UTC)

According to the ACM the document you reference is about algorithms whereby "An algorithm is a self-contained step-by-step set of operations that computers and other 'smart' devices carry out to perform calculation, data processing, and automated reasoning tasks."
Wikidata doesn't do automated reasoning and dataprocessing. Wikidata is an open database where data can come from a variety of sources and can be used for a variety of purposes. What kind of explanations would you like to see in the text?
When you call for a permanent trail of where data comes from, is your demand that we have a bot that automatically removes data from BLP that doesn't have references? ChristianKl (talk) 20:43, 14 October 2017 (UTC)
It is wonderful that the ACM has something to say. However "should" implies that we have no choice but. This is exactly where you are wrong. As it is there is a plethora of sources of our information so it will not be simple. In many of our practices we do not register where data comes from and only because an outside organisation has some wise words to say, it does not follow that we should. When you are talking about validation and testing, there are plenty of opportunities where we will make a qualitative difference. We have known about these for a long time and we don't. Now first do what we easily could do before telling us what we should do. Thanks, GerardM (talk) 08:08, 7 December 2017 (UTC)

alternate draft language[edit]

replace the draft with the following:

Data quality is an essential value of the wikidata community. This is especially true for data about living people.

principles[edit]

  1. Wikidata will provide for special attention to the principles of neutrality and verifiability in data about people.
  2. Personal privacy will be respected especially for people who are not public figures;
  3. Wikidata will investigate new technical mechanisms to assess edits, and provide reports to interested projects and quality circles, to assess and improve data quality; Wikidata will implement quality data management.
  4. Wikidata will institute a landing page, safe space, and response team to respond to complaints about data about people in Wikidata with patience, kindness, and respect, and encouraging others to do the same.

implementation[edit]

  1. Wikidata:Quality improvement
  2. Wikidata:Privacy
  3. Wikidata:LP technical
  4. Wikidata:Lounge

further reading[edit]

discussion[edit]

proposing to resolve this with another layer of indirection? But this means we'll have at least 4 pages to argue over, instead of 1... On privacy, the current page has some guidelines in this regard, do you have anything further in mind beyond what's already stated here? When you say "Wikidata will" what does that mean in practice? Administrators will enforce ...? The community will be expected to ...? WMDE/WMF will ... ? ArthurPSmith (talk) 13:05, 18 September 2017 (UTC)
i thought it was quite direct. agree on general principles, and then work out implementation as we go. it is action oriented, so no argument, merely work. the practice will be in the implementation. i did not mention enforcement. there is no requirement of micromanagement of implementation or listing rules enforcement: that is an artifact of another community. the long citation list shows that data quality principles are in every database software package. nothing to debate there. there is m:Privacy policy, but if you want to reinvent that wheel, go for it. Slowking4 (talk) 03:20, 19 September 2017 (UTC)
We don't have documents that define that certain work should be done because there's nobody towards which an RFC can task the work of developing an entirely new software feature. Wikidata isn't developing software. The WMDE is developing software with support of the WMF and as it's an open source program everybody can pitch in and submit his own code.
If you want to write a landing page there's no need to have document that says that we should have a landing page before you write a draft of a landing page. ChristianKl (talk) 21:25, 14 October 2017 (UTC)
we do not have documents or a plan, because there are no managers, merely nobodies who prefer drama. there are plenty of people, but they respond to leadership, not drama. WMDE could support LP technical with WMF, but it would would require volunteer support. the landing page exists, and is linked to, but again it would require volunteer support, and a grant as was done with Teahouse. the documenting of the process is necessary in order to make clear to the obtuse, just how the project will meet the global policy to the letter, and will not be derailed by other agendas. Slowking4 (talk) 03:20, 15 October 2017 (UTC)
If you want to have a grant to tackle such a problem with a grant funded project, write a grant proposal. The decision of who gets grants to do what isn't done via RFCs or other internal policy documents on Wikidata. ChristianKl (talk) 13:20, 15 October 2017 (UTC)

Example of a problematic bot edit[edit]

I thought this is an interesting case of a problematic biographic bot edit: Byron De La Beckwith, a KKK member whose notability/notoriety is the result of murdering a civil rights leader and harassing activists, was described as a "recipient of the Purple Heart medal" by User:PLbot. I'm sure that bot does a lot of good work, but it's a case in point why we need to be careful with descriptions of living people in particular -- determining exactly what belongs in a description and what doesn't requires some human judgment, or a smarter bot.--Eloquence (talk) 23:14, 18 October 2017 (UTC)

I don't see a substantial problem with that description. The living people policy exists to protect the people for whom we have entries and when a bot writes a description that errs by being more favorable than we would otherwise write that's okay. ChristianKl (talk) 23:28, 18 October 2017 (UTC)
It's a fair point that this edit was not harmful to the person (who, in any event, is no longer "living"), but I think it was harmful to readers and re-users nonetheless. These descriptions have powerful real-world consequences -- I personally came across it in the Wikipedia app and had an instant "WTF" reaction, suspecting vandalism (for comparison, Britannica's equivalent description is "American assassin"). But Sjoerd makes the good observation that the bot only bears part of the blame. In the scope of this policy, I do think it's worth treating edits to descriptions with particular care, given that they're often used to encapsulate a subject as a whole.---Eloquence (talk) 23:53, 18 October 2017 (UTC)
Imported from Persondata, not sure if you can blame the bot. Sjoerd de Bruin (talk) 23:36, 18 October 2017 (UTC)
Yes, that's true; PLbot appears to merely have copied it over. Thanks for pointing that out. I don't know what the overall quality of these "short descriptions" from persondata was; this one appears to have been originally added semi-automatically.--Eloquence (talk) 23:53, 18 October 2017 (UTC)
One of the problems with persondata was that it was hidden from view, so that errors, vandalism and statements which became outdated were not corrected. As you've shown, exposing them to more eyeballs is the best way to get such problems fixed. The reason for the semi-automated edit on Wikipedia was that, at the time, the article was in the now-defunct Category:Recipients of the Purple Heart medal; that was added on 4 February 2012, also semi--automatically; presumably because of this prose edit adding the same claim, possibly cited to [1], but that's paywalled. This 2001 Washington Post article, assuming it wasn't sourced from Wikipedia, supports the claim. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:37, 19 October 2017 (UTC)
That makes sense, Andy! Thanks for unpacking how this made it into the description. I have no reason to believe that the claim is false -- it just doesn't belong in the short description of a convicted white supremacist murderer.--Eloquence (talk) 19:41, 19 October 2017 (UTC)
  • @Eloquence: this is a good point, I added some comments regarding care needed for descriptions and aliases on the main page. ArthurPSmith (talk) 13:26, 19 October 2017 (UTC)
"undue weight in item descriptions" is not a living person problem. making factual statements that may be supported by a reference adds to data quality. maybe you should add purple heart back given the reference cited above. Slowking4 (talk) 18:08, 23 October 2017 (UTC)

How do we deal with requests for removal of personal information[edit]

The Wikimedia board resultion on BLP says that one principl is supposed to be: "Taking human dignity and respect for personal privacy into account when adding or removing information, especially in articles of ephemeral or marginal interest".

I would translate this into: "Request for removal of information - If the subject of an item request removal of specific information on the item and that information isn't of public interest, an administrator can delete and oversight it."

@ValterVB: undid this edit with the suggestion that any information that can be publically sourced is fair game. I think there's plenty of information published on blogs and social networks that can be sourced but where gathering all information about a given person can make the person feel that their privacy is violated. I can get a validly sourced birth date by having two archive.org archived pages that show the age of a person on a forum which narrow down their birthdate. That doesn't mean the person might not want their birthday to be public knowledge and have a reasonable interest in getting it removed. ChristianKl () 12:50, 24 November 2017 (UTC)

Not every item has NoValue statements[edit]

@Hsarrazin: You changed "date of birth (P569) is is missing or less than 115 years ago or date of death (P570) has NoValue" to "date of birth (P569) is is missing or less than 115 years ago and date of death (P570) has NoValue" on the grounds that it would declare people living in the 12th century as living. It doesn't. NoValue shouldn't be set for date of death (P570) for people in the 12th century. The clause is in the document to be able to give a person who's 117 years old the NoValue for date of death (P570) and thus mark them to be a living person for the sake of this policy. ChristianKl () 15:19, 4 December 2017 (UTC)

  • NoValue shouldn't be used for actual people.
    --- Jura 18:42, 4 December 2017 (UTC)
@ChristianKl:
the criterium as you stated it is "P569" missing OR "P570" is "novalue"... -> which means "P570" is "novalue" (3rd criterium) is enough... no need to add it on 2nd criterium… why do you add it to 2nd criterium with OR ?
and I agree with @Jura1: "NoValue" should not be used on P570 for human beings...

--Hsarrazin (talk) 20:18, 4 December 2017 (UTC)

As written a item is considered to be a living person if all three criteria hold. If just criteria (1) and criteria (3) hold, the person is not considered to be living for the sake of the policy. Given that this usage seems to be confusing and there are arguments against using NoValue here I rewrote criteria (2) to use floruit (P1317) to allow us to handle the edge cases of people who might live past 115. ChristianKl () 21:08, 4 December 2017 (UTC)
I don't quite get why NoValue is mentioned at all. Does that mean that if there is a death date, but it's incorrect (person is alive), the policy doesn't apply?
--- Jura 11:25, 7 December 2017 (UTC)

Updated draft[edit]

@ChristianKl: has made some good improvements to this draft policy, particularly regarding the actions admins should take when we do receive a request for removal of information. We also have new property classes to indicate the two categories of statements about people that should be treated with greater care. Are we ready to bring this up for another RFC, or more discussion still needed here? Maybe propose a future date for discussing this in RFC? ArthurPSmith (talk) 18:33, 4 December 2017 (UTC)

I'm planning to write an RfC. Part of me was planning to wait till my open RfC is closed before I start a new one, but even if that doesn't happen I want to open the RfC this year. ChristianKl () 21:02, 4 December 2017 (UTC)

Channel for users to report request for removals[edit]

I'm currently thinking about whether there's a better way than asking people to come to the admin board or contacting individual admins. Does anyone have other preferences? Ideally something where the person can privately report and then all admins can read the request. ChristianKl () 21:55, 4 December 2017 (UTC)

There is an OTRS queue for Wikidata. Emails sent to info at wikidata.org go there. --Lydia Pintscher (WMDE) (talk) 09:38, 5 December 2017 (UTC)
@Lydia Pintscher (WMDE): Thanks, that looks good. Who has access to the OTRS queue? Can we add a new email address like "privacy@wikidata.org"? ChristianKl () 10:47, 5 December 2017 (UTC)
Some editors have access but I am not sure who. Sorry. I believe new addresses can be added. --Lydia Pintscher (WMDE) (talk) 10:59, 5 December 2017 (UTC)
@Lydia Pintscher (WMDE): From reading the documentation on https://meta.wikimedia.org/wiki/OTRS/Access_policy it seems that there are "Role accounts". Can you look into (or delegate it) the question whether we could configure the system in a way where all Wikidata Admins would automatically have access to a OTRS page that receives the results of information that get's mailed to "privacy@wikidata.org"?
To me that seems like a good technical solution. Telling people to contact individual admins is problematic given that activity level among admins differ and some admins might not be interested in handling these requests. If you have any other ideas about how to create the interface to voice privacy concerns, I would also appreciate hearing them.
Given what written on the OTRS page there's the mention of Wikidata Staff as a role. Maybe only Wikidata Staff gets to read info@wikidata.org at the moment? If that's the case and we do go the road of using OTRS for "privacy@wikidata.org" it might also make sense to make info@wikidata.org accessible to Wikidata Admins. I also wouldn't mind giving access to "privacy@wikidata.org" to Wikidata Staff as well. ChristianKl () 11:27, 5 December 2017 (UTC)
No only some editors do have access to it. I can't read it or anyone else on my team. It is probably best if you ask one of them about the process and access. Sjoerd maybe? --Lydia Pintscher (WMDE) (talk) 15:56, 5 December 2017 (UTC)
I've received access to the info-wikidata queue after some request on the OTRS wiki. I think users without current OTRS access can do the same on meta:OTRS/Volunteering. I think, if we get more new oversighters in various timezoens, we can just handle it at oversight at wikidata.org. Sjoerd de Bruin (talk) 16:05, 5 December 2017 (UTC)
@Sjoerddebruin: Simply getting oversighters of various timezone doesn't seem like an easy task to me given that it's important to keep the oversight rights to a small number of trusted people. I'm however okay with the information simply going into the current OTRS system. I think it's worthwhile to have a specialized email address, so we can better the reason for incoming requests. Even when we currently don't get enough requests for it to be problematic, deciding now on having two email addresses will allow us later to send the requests to different queue is we need to because of an increased volume of requests.
What's the volume of Wikidata related requests at the moment? ChristianKl () 23:56, 5 December 2017 (UTC)

Can we find a way to mark in the references whether something is "widespread public knowledge" or "supplied by the individual themselves"?[edit]

I think it would be great if we would find a way to have people mark statements that are "widespread public knowledge" or "supplied by the individual themselves"?

Maybe type of reference (P3865) "supplied by the individual themselves"? ChristianKl () 23:09, 4 December 2017 (UTC)

Thinking a bit about this, I think for now specifying a way for this adds too much complexity, so I we don't need to specify a notation now. Maybe, we experiment with a notation and at a later point revisit the question about whether this policy should relate to it. ChristianKl () 12:48, 6 December 2017 (UTC)

Labels/Description/Aliases[edit]

This section is currently very short. It might make sense to expand here on what it means to be neutral. Does anybody have a good idea? ChristianKl () 10:39, 7 December 2017 (UTC)

Hmm, wikidata does not have a Wikidata:NPOV page. We could reference en:WP:NPOV which has stuff like "representing fairly, proportionately, and, as far as possible, without editorial bias, all of the significant views that have been published by reliable sources on a topic" but that's not necessarily applicable to labels I think. ArthurPSmith (talk) 15:54, 7 December 2017 (UTC)
Representing all the significant views isn't what a description is supposed to do, so it would make more sense to say that "neutral" doesn't mean what the enWiki policy says in that paragraph instead of saying that it's supposed to be read that way. ChristianKl () 14:25, 10 December 2017 (UTC)

Deletion[edit]

It makes sense to mention deletion of items within the context of administrative actions that could be taken, rather than as a standalone section. Nikkimaria (talk) 18:37, 7 December 2017 (UTC)

If you want to argue that it's an action that could be taken, than it might be make sense to put it into the section of administrative actions that could be take. You however argue that they should be taken. Specific things that should be done have each their own sections in this policy.
One of the advantages is that it makes it easier to vote in the RfC about which actions should be taken. ChristianKl () 14:22, 10 December 2017 (UTC)
I've changed the phrasing to match the other noted administrative actions. Nikkimaria (talk) 14:35, 10 December 2017 (UTC)
Okay, I'm fine with that wording. For explanation: We currently have the case of the Black Lunch Table. There are a few hundred items about living people who just have a name, instance of (P31), sex or gender (P21) and catalog (P972) Black Lunch Table (Q28781198). This means that the notability of the items in question is questionable. I do think that it's reasonable to decide whether we want those items by having a discussion about them, but I see no reason for this policy to say "We should delete those BLT items". If we delete them it can make sense to give the BLT folks a few months to see whether they can add information to make the items notable and there's no reason to create a rush via a Living persons policy. ChristianKl () 19:54, 10 December 2017 (UTC)
except some editors currently delete items without a wikipedia links as "not notable" after 7 days. doubtless they will cite LP to justify that deletion, even if structurally useful, as "presumably notable, per wp:before, but not yet proven". Slowking4 (talk) 22:59, 13 December 2017 (UTC)
WP:BEFORE? Are you proposing we import that principle from English Wikipedia? Nikkimaria (talk) 23:52, 13 December 2017 (UTC)
in effect the english notability is imported already in wikidata notability. (except for the deletionists) if you could write a keepable article then it is notable on both projects. Slowking4 (talk) 02:17, 14 January 2018 (UTC)

Burden[edit]

Not sure what is meant by this - the phrasing has nothing to do with bot vs non-bot editing, and so need not depend on any RfC outcomes about bot removal. I think it is vital to include for clarity. Nikkimaria (talk) 19:02, 7 December 2017 (UTC)

The phrase that there's a burden of proof suggest that you can remove all items where a person hasn't proven a statement. If that's actually what intended the easiest way is to have a bot that takes over the task of deleting everything where the burden isn't meet.
I want to design policy in a way that when there are conflicts between editors that the default is a mutual search of common ground. Having a mutual search for a common ground, means that we have a friendlier enviroment on Wikidata and it's not my desire to copy Wikipedia's culture of hostility and deletionism.
I wrote most of the draft language of this article and want to have it in the form I believe in when I start the RfC. To the extend that you have different opinions of how the policy should look like you are free to add option to the RfC once it's ready that outline your desired wording. ChristianKl () 14:01, 10 December 2017 (UTC)
As I said, it's got nothing to do with bots - no one AFAIK (other than perhaps your RfC) has suggested having a bot remove all unsourced statements. It simply means if there is a dispute about including a particular piece of information as a living-person issue then the burden is on the person wanting to include it. Feel free to suggest other ways of phrasing that. And no, just because you're writing an RfC doesn't mean the draft should only reflect what you want it to say. Why not have a "mutual search for a common ground"? (And why have all of those different issues in a single RfC in the first place, and not just "should we accept this as policy yes/no/with amendments XYZ?") Nikkimaria (talk) 14:20, 10 December 2017 (UTC)
In this case, the RfC is the venue where consenus around the policy is supposed to be found. That means that to the extend that there are different views it makes sense to have both views in the RfC to vote which one is prefered.
To the extend that you claim that nobody suggested we should delete unsourced claims, I guess that means that you haven't read previous discussions about what to do with the usage of an property like ethnic group (P172).
Let's take as an example Karl Marx (Q9061) ethnic group (P172) Jewish people (Q7325). Currently, the claim is unsourced even through we have him as an example for ethnic group (P172) and we consider it important to have this kind of claim sourced. I would want to encourage people to actually source the claim but I don't want to encourage the simple removal of claims like that.
As far as having this as one RfC, I think that's valueable because you actually need the policies in context to have a decent understand of the effects they are likely to have.
As a sidenote at the moment the section on dealing with complaints is probably most instable because I want to see the OTRS system from the inside before finishing it. There's also an open request for a developer opinion on https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team#Performance_effects_of_a_special_flag_for_bots_adding_certain_statements_to_living_people that I want to have answered before putting the actual RfC online. ChristianKl () 20:12, 10 December 2017 (UTC)
I haven't claimed no one has said we should removed unsourced claims - just that doing it en masse by bot is a different matter. But I must disagree with your point about having a single RfC. You could bundle acceptance of this as policy with associated changes necessary to other policies such as blocking. But the alt accounts provision is tangential enough that it should be dealt with separately (whether this policy is accepted or not). As for the potential bots, they're moot if the policy is not accepted, and not required for it to be; they are implementation questions best settled after the policy one is put to bed. Per your point here, best to discuss such options later. Nikkimaria (talk) 20:40, 10 December 2017 (UTC)
talk of burden of proof will be divisive. shifting the burden is a tactic in a battleground. you would do better to talk of seeking consensus of a standard of practice to follow. Slowking4 (talk) 03:17, 11 December 2017 (UTC)
Yep! The whole point of this draft is to develop a community standard of practice to follow. Nikkimaria (talk) 03:42, 11 December 2017 (UTC)
good - then strike that sentence and replace with a consensus process. "Wikidata properties likely to be challenged" created with no consensus whatever. Slowking4 (talk) 19:27, 11 December 2017 (UTC)
You misunderstand me: the policy proposal is a consensus-building process. The RfC is to assess the consensus for the proposed use of that statement, among other details; if you think "Wikidata properties likely to be challenged" should not exist, you will be able to express that feeling, with appropriate rationale, in that discussion. Nikkimaria (talk) 19:53, 11 December 2017 (UTC)
i guess you misunderstand, what i mean by consensus. it is not a feeling, it is a standard of practice. it is not a rationale, it is a process to follow, or not if you choose not to follow it. Slowking4 (talk) 23:03, 13 December 2017 (UTC)
This policy does define a process for adding/removing property likely to be challenged (Q44597997) and property that may violate privacy (Q44601380), given that I expect that as we create new properties and as we add lots of new data we will regularly have to have discussions about whether to add or remove those classes. I don't expect the actual text of the policy to change as frequently. ChristianKl () 12:15, 12 December 2017 (UTC)
If you think that we shouldn't use a bot to ask on mass to let people fulfill the burden of providing evidence, what kind of criteria would you expect a human editor that removes statements and asks people to fulfill the burden would use that makes his activities qualitatively different (and not only quantitatively)? ChristianKl () 00:00, 13 December 2017 (UTC)
Humans use judgement rather than criteria, whereas bots can only use the latter. The primary purpose of the phrase at issue is to address what happens when a statement is actually "challenged". Nikkimaria (talk) 00:16, 13 December 2017 (UTC)
Humans judge by criteria. Humans can use criteria that are more complex than bots, but in that case it would still be possible to describe the criteria in words. Someone might say "I'm challenging all claims that currently don't have sources". I don't think that the policy should indicate that this is a valid move. To use your phrasing, I think that case-by-case judgement is required to analyse how the burden of proof looks like. ChristianKl () 13:20, 13 December 2017 (UTC)
Or just common sense. I'd expect someone who mass-challenges obviously-true-yet-unsourced claims of "instance of: human" to be shot down pretty quick, but if someone has a good-faith concern about potentially controversial or privacy-violating claims being added then requiring decent sourcing for their inclusion is fair. Nikkimaria (talk) 14:25, 13 December 2017 (UTC)
Mass removing instance of (P31) human (Q5) is a strawman, as I said above mass removing all unsourced ethnic group (P172) for living people would be a more central issue. I think the policy as it's currently is powerful enough to be used with common sense without having a sentence about burdens. ChristianKl () 14:39, 13 December 2017 (UTC)
Disagree. As above, the purpose of the phrase is not to address large-scale bot-like changes, but rather disputes; if you'd like to suggest a different wording please do so. Nikkimaria (talk) 14:54, 13 December 2017 (UTC)
Policy shouldn't be judged on the intent but by the effect. I think the existing wording is fine and your approach problematic. You seem to be the only person who thinks that my draft is problematic in this regard. It's also worth noting that previous RfC's (see https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Findagrave_removed_as_a_source_for_information) indicate that you have a minority position when it comes to how we should go about deleting content. If you would believe that your position here would find general support, it would be easy to follow the suggestion of adding it as a point to the RfC. ChristianKl () 13:06, 14 December 2017 (UTC)
No, it wouldn't, since per my previous comments I think we should avoid having a million options for people to vote on in a single RfC. I've tweaked the wording to try to address your concern. Nikkimaria (talk) 13:46, 14 December 2017 (UTC)

Link to English page[edit]

It's possible avoid link to English page like this? Normally people don't follow all the page of all the wiki, and if someone change that page may not be advised. I prefer keep the page here, where people interested can add it to watchlist. --ValterVB (talk) 08:29, 9 December 2017 (UTC)

Is it better to link to the Wikidata item? It'd be cool to link to Special:GoToLinkedPage but I don't think it's possible to put the user's own preferred language into that (on the fly). Sam Wilson 13:17, 9 December 2017 (UTC)
I prefere a page here, where user can know when a page is changed, seeing the Recent Change or Whatchlist in Wikidata. --ValterVB (talk) 14:18, 9 December 2017 (UTC)
I'm also against linking out to enwiki policy like that but linking to our Wikidata item about is fine until we have our own policy. Additionally, I think a sentence about what it means for a Wikidata item description to be neutral would be well placed in that paragraph. ChristianKl () 14:22, 10 December 2017 (UTC)
I do not agree. I prefer to import the page and change little by little here. I want avoid somethin like "Because use the policy on en.wiki? de.wiki is better etc.etc. If we have a page here, it is the "our page", not "a page of other wiki", if the page is here persons can't say "I'm only on wikidata" I don't follow other wiki so I can't know if the rule is changed" and so on.... --ValterVB (talk) 15:32, 10 December 2017 (UTC)
I just went and reread the EnWiki policy and I don't think referencing it even indirectly helps, given that EnWiki just doesn't have a concept of a "description" and we don't want people to put all significant viewpoints in a description. ChristianKl () 19:58, 10 December 2017 (UTC)
I meant to say that not even the link to the item is a good choice, not only in the example that I linked but in general --ValterVB (talk) 20:12, 10 December 2017 (UTC)
given that they are wholesale cutting / pasting english policy here, it is good to be honest about that. but you are right. better to build consensus in this community rather than trying to cram down english policy yet again. Slowking4 (talk) 03:12, 11 December 2017 (UTC)
It's one thing to copy-paste policy, it's another to link out and make the authoritative version of the policy be hosted with EnWiki. Copy-pasting at least means that we are free to change the copy-pasted version of the policy that we host. ChristianKl () 11:33, 12 December 2017 (UTC)
@Slowking4: As far as this article goes the paragraph "Non-item space" is an adapated version from EnWiki. If you have any suggestion for improving it, I'm happy to hear them. ChristianKl () 11:34, 12 December 2017 (UTC)
i would strike the entire section. i do not see evidence of LP problems in user space. you are importing the English prescriptive practice about user pages. existing policy about disruption can cover. Slowking4 (talk) 23:09, 13 December 2017 (UTC)

First impressions of OTRS[edit]

I just got my OTRS access. It listed three open tickets for Wikidata. The oldest open ticket was 112 days old without getting addressed. I'm not exactly sure whether the problem is poor UI or whether there are simply not enough reviewers for Wikidata. It seems like I need another day to get access to the OTRS Wiki to get a better idea. ChristianKl () 12:12, 12 December 2017 (UTC)

I settled a bit into the OTRS structure and successfully requested privacy@wikidata.org to be added as additional email address. In case we will get more request via this channel we will need more Wikidata admins to have OTRS accounts but that seems like a solvable problem. Currently, the Wikidata queue has a total of 101 items that it received over the 5 years of Wikidata existence and some of that is spam. ChristianKl () 14:27, 13 December 2017 (UTC)

Should we add a way to tag removed claims to prevent them from being readded?[edit]

I'm thinking about the question of what to do when we remove information because of a request from the subject. One way would be to add "Unknown value" with a qualifier that indicates that the information was removed because of a request. What do you think? ChristianKl () 13:16, 13 December 2017 (UTC)

seems like a good idea : something like criterion used (P1013) - "removed by request of the subject of the item" (provided it is not "public" of course)... and maybe a link to the request database (like OTRS ticket on commons) ? --Hsarrazin (talk) 13:27, 13 December 2017 (UTC)
I'm not aware of how commons does it. Can you link me to a good example and/or the policy describing it? ChristianKl () 14:40, 13 December 2017 (UTC)
I'm not an OTRS user or admin, but [2] and more specifically, for image that has been checked with link to ticket. - this would probably need and OTRS ID property. --Hsarrazin (talk) 15:00, 13 December 2017 (UTC)
Yes, we would need some OTRS property. I see two possibilities: (1) "removed because of OTRS ticket", (2) "OTRS ticket ID". The first would have the advantage that it's less effort to just add one property. ChristianKl () 15:10, 13 December 2017 (UTC)
That would be an excellent tool to allow trolls to quickly find removed data in items' histories. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:17, 14 December 2017 (UTC)
tags are an english solution, perhaps you meant query? or superprotect? Slowking4 (talk) 19:57, 15 December 2017 (UTC)
I'm not sure what your point happens to be. I used tag as a verb. Could you explain your issue? ChristianKl () 00:08, 21 December 2017 (UTC)

This proposal is a bad thing[edit]

Obviously there is a need to deal with living people. However, this proposal will prove to be highly detrimental to diversity. It will stop the growth of information and it is back to the bad old days of Wikipedia thinking in stead of thinking in terms of data, sources and quality through comparison.

  • In a paper about Wikidata, it was mentioned that the implementation of constraints had a detrimental effect on the diversity of the Wikidata data.
  • We are getting to a stage where we can compare data. When we compare what DBpedia has to say about the statements with what we know that is relevant in a BLP way, when the data is the same, we know that we are in sync with (a) Wikipedia. It is particularly where there are differences where we want to consider the requirement for sources. This is exactly where our time is well spend.
  • When we add data from Wikipedia, we know that the quality of Wikipedia is often so so. Improving on the shared data in both Wikipedia and Wikidata should be done in a smart way.
  • Sources known in Wikipedia are acceptable in sources in Wikidata
  • We do not have methods to invalidate sources.

Thanks, GerardM (talk) 20:17, 6 January 2018 (UTC)

What do you mean by "Sources known in Wikipedia are acceptable in sources in Wikidata"? Nikkimaria (talk) 22:25, 6 January 2018 (UTC)
When a fact comes from a Wikipedia, we can expect that there is a source. Thanks, GerardM (talk) 19:10, 7 January 2018 (UTC)
Ideally that would be true, in practice it isn't. Nikkimaria (talk) 01:04, 8 January 2018 (UTC)
"In a paper about Wikidata, it was mentioned that the implementation of constraints had a detrimental effect on the diversity of the Wikidata data." which paper do you mean? ChristianKl❫ 23:17, 6 January 2018 (UTC)
yes, i agree. this proposal is the same old adversive, directive bag of tricks imported from other wikipedias, unhinged from database quality assurance. there are policies appropriate to databases and data sets, too bad they are not incorporated here. Slowking4 (talk) 02:10, 14 January 2018 (UTC)
@Slowking4: You speak a lot about that topic but you haven't made any concrete suggestions besides advocating for principle (and thus that somebody else should make concrete policy based on those principles). If you have concrete policy suggestions, why don't you write them up? ChristianKl❫ 02:54, 14 January 2018 (UTC)
ok Wikidata:Living people (draft 2) see also Wikidata_talk:Living_persons_(draft)#ACM_on_transparency_and_accountability above. as you might note, i have a fundamentally different way of managing projects or policy. doubt we will agree on any means and methods here. Slowking4 (talk) 03:17, 14 January 2018 (UTC)

GDPR - General data protection regulation[edit]

On May 25, 2018 a new European regulation becomes due. We should, as any enterprise or organisation in the world dealing with EU citizens, apply those rules. Any data item that could possibly identify a living person is privacy sensitive. This goes as far as e.g. name, photo, telephone number, adres, bank acount number, e-mail adres, IP address, etc. - Geertivp (talk) 15:10, 10 January 2018 (UTC)

What practical steps do you think those rules imply for Wikidata that aren't currently taken? ChristianKl❫ 02:52, 14 January 2018 (UTC)