Wikidata:Project chat/Archive/2020/02

From Wikidata
Jump to navigation Jump to search

United Nations Security Council resolution (Q877358) and full work available at (P953)

Now that UN document symbol (P3069) exists, would anyone here mind if i remove all 'full work available' from instances of United Nations Security Council resolution? --Trade (talk) 17:03, 1 February 2020 (UTC)

@Trade: remove? We have robots to replace that, see Property talk:P973 for examples. Multichill (talk) 22:38, 2 February 2020 (UTC)
@Multichill: I gave it a try --Trade (talk) 00:00, 3 February 2020 (UTC)

Wildly variant genealogies?

I thought I knew how to deal with genealogical variants: find the most authoritative source, use that as a backbone, then note variant parents / spouses / children / dates with sourcing, while distinguishng the apparently authoritative version "preferred" rank.

But what if the genealogy from a particular source is *wildly* different -- eg keeping the same main sequence of names (more or less), but systematically assigning them the dates, spouses, and collateral children that in other sources are assigned to somebody else -- essentially giving the name a totally different bio, so it's not just a question of a couple of statements different, but instead more like a completely different person being attached to the name, so different is the presented tree.

I was merging apparently duplicate items created with the The Peerage person ID (P4638) import when I hit this, and now I'm not sure what to do.

Do I try to merge by the by-names (the first names all being "Ulick"), essentially mashing together two completely different biographies? Do I try to merge according to the biographies, where this seems possible, ignoring that they've been attached to different names? Or do I try to maintain both versions in the database unmerged, in which case what properties should I use to connect the different biographical "versions" attached to a particular name, and/or how do I indicate that an entire "version" (ie one particular item for that name) is to be preferred over the second one ?

I am tempted to use something like partially coincident with (P1382) to link items for two contending versions of lives attached to the same name; but I'm not sure it's appropriate for radically different incompatible reconstructions of the truth, as opposed to variant descriptions or terminology for a single reality. Noe am I clear how I would indicate that the other version should probably be the preferred version. Jheald (talk) 18:14, 1 February 2020 (UTC)

@Jheald: This means a conflation, see Help:Conflation of two persons]. The Peerage item should be tagged as conflation (Q14946528) and have all data deprecated.--GZWDer (talk) 18:48, 1 February 2020 (UTC)
@GZWDer: Neat. Yes, that seems a very good way to deal with it. So how to relate the conflation to the underlying people (conflation "of" ?) on the conflated item, and how to link to the conflated item from the "good" items? ("different from" X + "object has role" = "conflation" ?) Jheald (talk) 18:55, 1 February 2020 (UTC)

--GZWDer (talk) 19:03, 1 February 2020 (UTC)

GZWDer is there a good way to indicate which items are conflated in the ID of interest? For a non-Peerage example, the Internet Broadway Database ID of Louise Allen (Q70776916) conflates the actress who flourished in the 1920s and 30s with her aunt of the same name (Louise Allen (Q70760915)) who was born in 1873 but who died in 1909. In cases where an External ID conflates two or more distinct items, would partially coincident with (P1382) or partially coincident with (Q78694451) be added as a qualifier to the deprecated ID, or are there better ways? -Animalparty (talk) 23:37, 2 February 2020 (UTC)

how do i post my logo to my ceate new item profile

Waw*Mart (Q84203597)  – The preceding unsigned comment was added by Waw mart2 (talk • contribs) at 07:15, 2 February 2020‎ (UTC) (UTC).

Waw mart2, if the logo is released under a free license, or is otherwise public domain, you first need to upload it to Wikimedia Commons (make sure it is under a compatible license: see Commons:Licensing). You can then add it to Q84203597 as logo image (P154). And, whether you are the subject of the item or not, please review Wikidata:Notability and Wikimedia FAQ on paid contributions without disclosure. Be prepared for bush back under an actual or perceived conflict of interest. Cheers, -Animalparty (talk) 23:59, 2 February 2020 (UTC)

Same data for P18 and P2716

Please move collage image (P2716) close to (below or under) image (P18). For some items there is a same file image for both and it is hard to see it when they are far from each other... Or just add some notification (error message) when someone try to add image which is already used in the other property on the same item. --Termininja (talk) 08:54, 2 February 2020 (UTC)

You can use
select ?item ?photo where {?item wdt:P2716 ?photo . ?item wdt:P18 ?photo}
Try it! to find them, I believe? Edoderoo (talk) 11:08, 2 February 2020 (UTC)
I know how to use SPARQL, the problem is about the item interface (example Heterobranchia (Q133143)) --Termininja (talk) 11:58, 2 February 2020 (UTC)
This is controlled by MediaWiki:Wikibase-SortedProperties - best place to request this be moved is there. In general it looks like P18 goes at the top, or some properties which probably get used instead of P18 (eg logo image), and then there's a generic group of "other images of the subject" lower down, including P2716 (except for P1442, image of grave, which is filed alongside date of death). Andrew Gray (talk) 20:18, 2 February 2020 (UTC)

Years ago I suggested that we should have a 'should not have the same value as' constraint. And a 'should have the same value as' one as well. They are still missing, unfortunately. Thierry Caro (talk) 17:38, 2 February 2020 (UTC)

User Account Verification

I think here in Wikidata are users who use their account also for their work or who edit Wikidata mostly as their job. In that case it is paid Editing. At the subpoint Paid contributions without disclosure of the terms of Use of the Wikimediafoundation it is clarified that users who recieve a compensation for one of their edits need to clarify that on their userpage or at the talk page of the editet article or item or in the edit summary of every contribution for what they recieve money. I think this is an important thing. That more people can read it and understand it, it were great if there is a template for that, with the informations in more than the English language. -- Hogü-456 (talk) 20:41, 2 February 2020 (UTC)

Got any examples of paid editors working on Wikidata? --Trade (talk) 00:11, 3 February 2020 (UTC)
Since they haven't been required to declare here, that would be hard to identify in advance, but I bet if you look at the self-declared paid editors on the English-language Wikipedia you will find people who have edited Wikidata, and it's a pretty safe bet that they've done that in their paid capacity. - Jmabel (talk) 01:57, 3 February 2020 (UTC)
I am checking new company items frequently and I can see there many paid editor suspicions. Usually items from new users about (new) not notable companies (and its founders) with a description in promotional style. I am usually proposing these for deletion. See for example Badassentrepreneur (talk • contribs • deleted contribs • logs • filter log • block user • block log • GUC) and Wizkidoftheinternet (talk • contribs • deleted contribs • logs • filter log • block user • block log • GUC).--Jklamo (talk) 02:26, 3 February 2020 (UTC)
@Jklamo: i've made a similar page with websites instead of companies. Might be worth keeping an eye on. --Trade (talk) 00:29, 4 February 2020 (UTC)

Commons link should exist

At David Kettlewell (Q50311931) there is a ! warning that "Commons link should exist." Exactly which commons link is required? Is the warning because there is a category link in the slot for commons? --RAN (talk) 23:54, 2 February 2020 (UTC)

That's an error that plagues every Commons Creator page statement. Ignore it. -Animalparty (talk) 00:20, 3 February 2020 (UTC)
Can we remove: item requires statement constraint:property:Commons category? I am not sure it serves any purpose. --RAN (talk) 02:32, 3 February 2020 (UTC)

Mismatched reference: first version to be deployed this week

Mismatched reference notification preview

Hello all,

As announced last month, we’ve been working on mismatched reference, a new feature that alerts users when editing a value without changing the existing attached reference. This feature has been tested over the past month. Based on the positive feedback we received, we are now able to move forward and enable the feature on This will take place this week, in two different steps:

  • Today around 13:00 UTC, you will be able to see a notification (similar to the constraint ones) after saving an edit
  • On Thursday, February 6th, we will also enable a button that will allow you to hide the notification if you think that the reference is not mismatched

Please note that for now, the feature is not persistent: the editor who made the change will see it appear when they saved their edit, but if they reload the page, the notification will be gone. Other users also won’t be able to see it. We are considering adding this persistency feature in the future.

If you want to give feedback about the feature, feel free to use this talk page. If you want to report an issue directly on Phabricator, feel free to use this form. Cheers, Lea Lacroix (WMDE) (talk) 11:22, 3 February 2020 (UTC)

signature (P109)

I have added sinature in Hebrew (signature (P109)) in Moshe Dayan (Q188783) which already contaon signature in English but got problem tag single value constraint (Q19474404). I think that it should be allowed more then one language signature. Geagea (talk) 13:14, 3 February 2020 (UTC)

@Geagea: I've amended the single value constraint on P109 so that language of work or name (P407) acts as a separator, to deal with the commonplace issue of a person having different signatures for different languages. The constraint no longer triggers on ... thanks for spotting this. --Tagishsimon (talk) 13:36, 3 February 2020 (UTC)
Thanks Tagishsimon. Geagea (talk) 15:18, 3 February 2020 (UTC)

Wikidata weekly summary #401

Scaling WDQS

I made a grant proposal with an initial project plan including a potential solution. Please come and visit the grant page for a future-proof WDQS and discuss it in the talk page.

Symbol delete vote.svg Withdrawn as of 30 January 2020. – 05:31, 4 February 2020 (UTC)

Limit for dates in Gregorian

What about setting a date to Gregorian calendar to be possible only for years after 1582. Also last year for setting a date to Julian calendar to be 1923 as Greece being the last European country to adopt the Gregorian calendar in 1923. Xunonotyk (talk) 09:31, 30 January 2020 (UTC)

I think the en:Proleptic Gregorian calendar may be in use in some fields. Ghouston (talk) 09:34, 30 January 2020 (UTC)
Since 0000-03-01 depending on my mood, because an ISO year 0000 is one thing, but deciding on leap year or not goes too far. An ideal switch from proleptic Gregorian to Julian could be the century when both calendars agreed. 2nd or 3rd century—I could check it in a vintage 2006 template on Meta.:-) 05:45, 4 February 2020 (UTC)

Editing 5 books


I just joined Wikidata. I would like to manually update 5 records:

with information from the editions that we publish:

What does the community recommend?

19:52, 3 February 2020 (UTC)19:52, 3 February 2020 (UTC)19:52, 3 February 2020 (UTC)Cinna Babu (talk)

@Cinna Babu: Follow the advice at Wikidata:WikiProject Books#Edition item properties and create new items for your editions based on the table there. --Tagishsimon (talk) 20:26, 3 February 2020 (UTC)
@Cinna Babu:, for an example on how a book edition (Q57933693) can look like check out Be Loved (Q84083827) --Trade (talk) 23:58, 3 February 2020 (UTC)

Thanks for the pointers! I just created my first Wikidata page: (Q84361166)

I was not able to add the translator: Brian Dana Akers

I was not able to add the publisher:

Not sure how to deal with the languages. The book is a bilingual Sanskrit-English text.

00:24, 5 February 2020 (UTC)00:24, 5 February 2020 (UTC)00:24, 5 February 2020 (UTC)~~

Quality of referencing required to support relationship

Hi. What level of referencing is required to support a claim of family relationship? I have been working on tidying some records related to the Crowdy family. I've been able to link three sisters together, Edith Frances Crowdy, Isabel Crowdy and Rachel Crowdy, with two sources saying they are the daughters of James Crowdy (solicitor) and Mary Isabel Ann Fuidge. [1][2] Meanwhile, James Fuidge Crowdy is stated by a forum post on a genealogy website to be the child of James Crowdy (solicitor) and Mary Isabel Ann Fuidge.[3] All four of the potential siblings were alive during the same period. Is the forum post considered reliable enough for Wikidata or is there a higher standard that has to be achieved? From Hill To Shore (talk) 22:36, 3 February 2020 (UTC)

I don't know of any relevant Wikidata policy; indeed Wikidata has been resistant to requiring decent sources. I have done genealogical research for my own family, and found user-contributed information on genealogy sites is often based on wishful thinking. Also, Wikidata is structured to provide a reference to a reliable source that plainly states the fact in question. Sources that are an essay about why something might be true are not suitable. Jc3s5h (talk) 23:19, 3 February 2020 (UTC)
Currently we use "reference_url=" with "stated_in=Oxford Dictionary" and even "quote=.... one of the four daughters of James Crowdy, solicitor, and his wife, Mary Isabel Anne Fuidge" if there is a sentence that can be cut and pasted. We have multiple entries for some birthdates based on genealogical sources handled that way. If new, more reliable info come along you can deprecate the bad value. Don't forget to use the Wikidata family tree template if there is an entry at Wikimedia Commons. See Anton Julius Winblad I (Q20667111) and click through to Commons to see the chart. Also I recommend getting a free account at Familysearch where you can enter a new person, if not already there and link to Wikidata. If you visit Familysearch I added in the family and attached them to the various censuses for England as well as extant birth, marriage, and death records and linked to Wikidata for you. Good to see someone else working on connecting family data here. --RAN (talk) 01:27, 4 February 2020 (UTC)
@Richard Arthur Norton (1958- ): Thanks for the advice and sorting through those Crowdy records. It is a really useful perspective to just post information with whatever reference is available and then replace it later if higher quality references materialise. Coming from English Wikipedia, I'm used to much more stringent policies. I'm happy to adapt to how things are done here though.
By "Wikidata family tree template" do you just mean the Commons:Template:Wikidata Infobox used in Commons categories?
I don't think I will sign up to Familysearch. Submissions aren't covered by Creative Commons and I am not comfortable giving a religious organisation ownership of my work, especially where they have specific monetised purposes for the data listed in the site structure. I'm coming at this solely from the perspective of linking related Wikipedia subjects together through Wikidata rather than producing Wikidata for genealogy; if someone wants to take my work (released under Creative Commons) and put it on Familysearch then that is their choice. From Hill To Shore (talk) 13:19, 4 February 2020 (UTC)
@From Hill To Shore: Genealogical information is not covered by copyright, it does not meet the threshold of originality. Birth, marriage, and death information is available in public documents. Only if you synthesize that information into a prose biography using original research such as interviews or information in personal letters would it be capable of earning copyright protection. I don't think any single Wikidata entry is copyrightable. All of Wikidata as a data set would be because of the "sweat of brow concept". Just as a telephone book entry is not copyrightable, but the entire book may be. In Familysearch I am just clicking on the documents that belong to the family, like the census and the birth, marriage, death records so that they are linked to a specific person. I don't know what "specific monetised purposes for the data listed in the site structure" means. They do not charge for the service, nor do they sell anything. I am not sure why you think a posting in newsgroup is ok, but are wary of the actual primary source documents that have been scanned and indexed by volunteers. I am not a member of the LDS, I just use their documents and link to them here at Wikidata. Registration is required because there is information on living people in the census data. --RAN (talk) 13:41, 4 February 2020 (UTC)
Let us just leave it at, "I am not comfortable with supporting the activities of any religious organisation (of any faith)." This is a topic area that could become messy if other editors decide to become involved, so best not to continue on that theme.
In terms of quality of data, I have not said that the forum post is in anyway superior to primary sources. I started this thread specifically because I was unsure whether we could use a forum post. I have seen a lot of unsourced or poorly sourced material here and was trying to see if there is a minimum standard that we need to achieve. From Hill To Shore (talk) 13:53, 4 February 2020 (UTC)

Editing the header for a Wikidata entry

Every once in a while I am editing the titles for a Wikidata entry and I am brought to the screen where all the "also known as" entries are separated by a vertical bar, which was the the entry screen during creation. How can I do that on purpose so that I can edit "also known as" entries that have overflow text, there is a glitch that blocks entries from being edited that have overflow text, but when I am mysteriously propelled to the creation edit screen I can make correction because the field is longer. What magic can I use to go there on purpose? --RAN (talk) 01:16, 4 February 2020 (UTC)

@Richard Arthur Norton (1958- ): That’s the special page Special:SetLabelDescriptionAliases, which is the link target of that “edit” link. Usually, left-clicking the link will only follow it (and open the page) if the click happens early, before the page’s JavaScript has loaded; however, you can always follow the link in other ways, e. g. via right click > open link in new tab or (if you have a three-button mouse) middle click. And you can also go to Special:SetAliases directly, to set just the aliases without label and description – you can add the entity ID and language code in the title, e. g. Special:SetAliases/Q4115189/en. (Warning: those special pages don’t support editing aliases which themselves contain vertical bar characters, see phabricator:T219499.) And the issue with being unable to edit overlong aliases normally is phabricator:T234804. --Lucas Werkmeister (WMDE) (talk) 10:54, 4 February 2020 (UTC)
@Lucas Werkmeister (WMDE): Wow! Thanks, the right click workaround is perfect. --RAN (talk) 20:23, 4 February 2020 (UTC)

Add instance of Jessica Harris, podcaster, film producer, director

For consideration, I would like to add the following data:

instance of: human sex or gender: female country of citizenship: United States of America given name: Jessica family name: Harris birth name: Glass occupation: radio personality, podcaster, documentary film producer, documentary film director educated at: Harvard College, Columbia Business School official website: References: IMDB, Wikipedia

Thank you, Brian Williams Ploughdeep (talk) 03:59, 4 February 2020 (UTC)

@Ploughdeep: I've made a start at Jessica Harris (Q84332670) and note also From Scratch (Q5505508) --Tagishsimon (talk) 04:19, 4 February 2020 (UTC)
@Tagishsimon: Thank you very very much! -- Ploughdeep (talk) 02:48, 5 February 2020 (UTC)

Wikidata = quick-data

Just recalled that "wiki" is meant to mean "quick" .. so "QuickStatements" is just wiki-statements? --- Jura 20:51, 4 February 2020 (UTC)

Structured data on WikiSource - any takers?


I'm a WikiSource editor working on the US Statutes at Large [4] from the 1904/05 Congressional session. There are thousands and thousands of federal pension grants and increases in these pages, so of passingly important social history/military history/PhD/genealogical value to somebody, and I'm pretty sure the Library of Congress isn't going to cover these in its digitisation project because they are technically Private Acts.

For logistical reasons, I have had to develop a suite of templates for proofreading, which means that all this lovely data now accidentally has a simple structure. In its current state, therefore, the data set is conceivably only a step away from being machine-readable/searchable (each pension has at least the following fields: date=, grantee=, quality=, rate=, gender=). Once the proofreading is finished on WS, we will just subst: the whole lot to avoid Template limits, whereupon it just goes back to being in a sense "dumb" data again. Does anyone have any ideas about who might want to host such data? Is it WikiData's bailiwick? Is there a potential SMW hoster out there somewhere? It would be such a shame to lose all the carefully applied data structure which should be of value to someone... CharlesSpencer (talk) 14:54, 4 February 2020 (UTC)

  • How many entries are we talking about? What's written in the quality and rate fields? ChristianKl❫ 16:32, 4 February 2020 (UTC)
    • @ChristianKl: Rate is dollars per month in words only - generally something between 8 bucks and 30 bucks per month, I assume depending on a combination of rank of the pensioner and how well he/ his widow knew his congressman/senator!
    • Quality is generally something like “late Captain, Company X, New York Volunteer Infantry”, sometimes “widow of Joe Bloggs, late Captain [etc. etc.]”. For some of the more obscure engagements and/or units, there might be a terminal qualifier such as “war with Mexico” or “Seminole Indian disturbances”. Once it’s all been fully proofread, it could potentially be very interesting to further break down the “Quality” data into rank, unit (sometimes units plural), and maybe campaign (where specified). Who knows, you might even be able to develop a full nominal roll of “Captain Whosit’s Company, Wisconsin Irregular Militia, War with Mexico”!
    • And there are (at a guess) almost a thousand pages at 4-5 acts per page (99% of them pensions) just in Vol. 33 - there are probably tens of thousands over the years - a potentially fascinating social history resource. CharlesSpencer (talk) 00:31, 5 February 2020 (UTC)
  • @CharlesSpencer: Oh, this is interesting! First of all, it might be worth just taking the whole lot, converting into a TSV or similar plain text file, and posting the whole lot on figshare or another repository - this might be of interest to a researcher in its simplest form. I think we could definitely do something with these here, though, and a private act still seems notable and substantial enough to include.
In terms of data structure, we'd be able to use something like:
Title: An act granting a pension to XYZ (I presume these are the standard titles)?
instance of (P31):private act of the United States congress (might as well create a new subclass of Act of Congress (United States) (Q476068)
legislated by (P467): 58th United States Congress (Q4640944), with qualifier point in time (P585) for date of approval
publication date (P577): date of approval
legal citation of this text (P1031): formal citation calculated from the structured data
full work available at (P953): Wikisource link (unless you will have one WS page per private Act, in which case we can use a sitelink)
We wouldn't easily be able to make use of the rate ($30), gender, or quality ("widow of...") fields, unless we tried creating items for each recipient. Which wouldn't be impossible, but might be a bit unwieldy since we don't really know anything about them other than that they were granted a pension on a specific date. Andrew Gray (talk) 20:59, 4 February 2020 (UTC)
A possible solution is to annotate the HTML outputted by the templates with Microdata. It is what we do in the french Wikisource header templates (page example). Then you could use an HTML/microdata scraper to extract all the embedded metadata. Tpt (talk) 21:38, 4 February 2020 (UTC)
@Andrew Gray: @Tpt: I’m all at sea in Wikidata land, so I only wanted to alert people who know what they’re doing to make best use of my accidentally structured data - I’m afraid I have more than my hands full myself with the proofreading! And don’t worry - I shan’t be :subst-ing anytime soon. Please make use of the WS data to your hearts’ content! CharlesSpencer (talk) 00:31, 5 February 2020 (UTC)
Excellent - I'll have a think about how best to do this and let you know how it goes :-) Andrew Gray (talk) 00:32, 5 February 2020 (UTC)
It seems to me that we do know a least a bit about the person "spouse of X" maybe with <SomeValue> (stated as...) when the person served as a captain in the New York Volunteer Infantry that's also enough information for an item about the person. ChristianKl❫ 08:21, 5 February 2020 (UTC)

Determination method [for copyright status]

Do we have a Determination_method [for copyright status] that covers the license {{Anonymous work}} {{PD-US-unpublished}} ? --RAN (talk) 07:03, 5 February 2020 (UTC)

Richard Bachman - pseudonym for Stephen King

Hello all, I know that pseudonyms were discussed here several times previously and the consensus was that they represent the same entity as the person. There is a very small minority of items which still do not follow the rule, such as Richard Bachman (Q3495759) who corresponds to Stephen King (Q39829) but they obviously have separate items. My opinion is that we should be consistent in following the rule above, but what is your preferred solution in case of an item which has so many sitelinks? Would you go for Wikimedia duplicated page (Q17362920)? --Vojtěch Dostál (talk) 10:02, 5 February 2020 (UTC)

How about said to be the same as (P460)? ChristianKl❫ 12:20, 5 February 2020 (UTC)
@ChristianKl: Which one should have the unique identifiers then? They obviously trigger constraint violations :) Vojtěch Dostál (talk) 14:56, 5 February 2020 (UTC)
I would go by whatever the relevant authority considers to be the main name. And it's worth pointing out that VIAF also seems to have a distinct ID for both. ChristianKl❫ 15:07, 5 February 2020 (UTC)

Property for a GLAM institution contributing data to a project/system

Hi everyone, I've got a question concerning GLAMs that are contributing with their collections data to a project, system or infrastructure. I want to display that museum XY is sharing data with e.g. Europeana, K-samsök/SOCH etc. Is there a specific property for that relation? Or which one would you recommend? Tulipasylvestris (talk) 17:12, 5 February 2020 (UTC)

Wikimedia 2030 community discussions: Halfway to the conclusions

Icons of 13 strategic recommendations

As I wrote in my last message here, the strategic recommendations for how we can achieve the Wikimedia 2030 vision are available for your final review. There are three weeks left to share your feedback, questions, concerns, and other comments.

These 13 recommendations are the result of more than a year of dedicated work by working groups comprised of volunteers and staff members from all around the world. These recommendations include the core content plus the Principles and the Glossary, which lend important context to this work and highlight the ways that the recommendations are conceptually interlinked. The Narrative of Change offers a summary introduction to the recommendations material. On Meta-Wiki, you can find even more detailed documentation.

Community input has played, and will continue to play, an important role in the shaping of these recommendations. They reflect this and cite community input throughout in footnotes.

In this final community review stage, we're hoping to better understand how you think the recommendations would impact our movement – what benefits and opportunities do you foresee for your community, and why? What challenges or barriers could they pose for you?

After this three-week period, the Core Team will publish a summary report of input from across affiliates, online communities, and other stakeholders for public review before the recommendations are finalized. You can view our updated timeline here as well as an updated FAQ section that addresses topics like the goal of this current period, the various components of the draft recommendations, and what's next in more detail.

Thank you again for taking the time to join us in community conversations, and I look forward to receiving your input. Happy reading!

SGrabarczuk (WMF) (talk) 02:20, 6 February 2020 (UTC)

There don't appear to actually be any changes to the content since the last message here, nor do the talk pages indicate any responsiveness to community concerns. Meh. --Yair rand (talk) 03:03, 6 February 2020 (UTC)
Do you mean changes to the content of the recommendations? This is because the changes will be done after 21 February, when all feedback has been collected. SGrabarczuk (WMF) (talk) 03:23, 6 February 2020 (UTC)

UUID search

Some time ago, we discussed Uniform Resource Name (Q76497) here (Project_chat/Archive/2019/10). This lead to the creation of URN formatter (P7470).

Possibly a direct application for this at Wikidata could be to enable search by Universally Unique Identifier (Q195284) (UUID). These alphanumeric strings are used by various identifiers. Sample is 03cefc64-0bb7-42e6-9bf2-2200f42c3318, a value of Musicbrainz for Yankee Stadium (Q675214). URNs normalize these to a format like <urn:uuid:03cefc64-0bb7-42e6-9bf2-2200f42c3318> (lowercase, with "-").

The property documentation template on talk pages of properties with URN formatter (P7470) already include a query that generates such values for the property (see Property talk:P1004#URN). However, as some normalization is included, querying across identifiers isn't reallly efficient (test at Wikidata:Database reports/uuid).

If there is interest, a simpler approach could be to generate URN-triples by Wikibase directly from these identifier statements, similarly to the way other triples are generated. --- Jura 13:13, 3 February 2020 (UTC)

Random observation, not everything formatted like a UUID necessarily is a UUID. RFC 4122 has various verified errata, and we don't know what Musicbrainz actually does, e.g., is it correct? – 06:05, 4 February 2020 (UTC)
Sure. Musicbrainz provides some explanation on their website on what goes into their identifiers.
The uniqueness of some versions of UUID is based on the assumption that not every database has, e.g. entries. This might eventually be incorrect. However, none of the "dups" on Wikidata:Database_reports/uuid seem to be due to that. --- Jura 10:23, 4 February 2020 (UTC)

Hi @Jura1:, thanks for mentioning this at The thesauri use GUIDs for new entries, but they don't use <urn:uuid:>, instead they embed them in ark: URLs. Also, all their 30 thesauri are mixed in the same namespace. Could you please explain how P7470 could help this problem? --Vladimir Alexiev (talk) 11:33, 6 February 2020 (UTC)

@Vladimir Alexiev: If you have a UUID like 171394dc-5472-4322-9017-b8ca091e18b0, you could search

SELECT * { ?a ?b <urn:uuid:171394dc-5472-4322-9017-b8ca091e18b0> }

Try it!

Of course, one could do:

SELECT * { ?a ?b "171394dc-5472-4322-9017-b8ca091e18b0" }

Try it!

as P7711 has well formed values, but other databases might use a format for the same UUIDs like "171394DC547243229017B8CA091E18B0" and mere string search wouldn't find it. --- Jura 11:52, 6 February 2020 (UTC)

Any talk link should be prevented

e.g. special:diff/849336521.--Roy17 (talk) 20:35, 4 February 2020 (UTC)

@Incnis Mrsi:  – The preceding unsigned comment was added by Jmabel (talk • contribs) at 20:57, 4 February 2020‎ (UTC).
@Jmabel: pings only work if you add a signature in the same edit. @Incnis Mrsi: your edit is up for discussion. --Lucas Werkmeister (talk) 11:22, 5 February 2020 (UTC)
Oops… this was a side effect of a renaming kludge in Wikipedia. Override. Incnis Mrsi (talk) 11:27, 6 February 2020 (UTC)

Missing labels again

See near-square prime, position creation or Beatriz Mayordomo Pujol as random examples with {{Label}} that does not fetch the label in English, nor in other languages, nor with Lua mw.wikibase.getLabel, nor after purging. It was phab:T237984 closed as resolved. --Vriullop (talk) 10:59, 6 February 2020 (UTC)

It is spreading! Last night the Wikidata/FamilyTree template began displaying Q numbers and Category links instead of the names. I thought it would be fixed overnight, but still off. See Commons:Category:Sir Hector Og Maclean, 15th Chief --RAN (talk) 19:13, 6 February 2020 (UTC)
It shows labels up to Q74000000 ([Plan for cancer prophylaxis]), and then Q numbers after that (The risk of rapid and large citrated blood transfusions in experimental haemorrhagic shock). In Special:ExpandTemplates it shows labels, including for items above Q74000000. Peter James (talk) 19:33, 6 February 2020 (UTC) Special:ExpandTemplates only shows labels above Q74000000 if link=y is specified. Peter James (talk) 19:42, 6 February 2020 (UTC)


Moin Moin together, I have a question about/for properties. I can see how many Wikidata objects we have, but where do I see the average number of properties per object and where a total number of all properties in total that are set. Is there something I can have a look at? Regards --Crazy1880 (talk) 20:09, 5 February 2020 (UTC)

Moin Moin, I had had a look at, but didn't found anything helps me. --Crazy1880 (talk) 20:18, 8 February 2020 (UTC)

Bogus category constraint warning

I have just added to Category:Food banks in Canada (Q8463144) a statement thus:

Category:Food banks in Canada (Q8463144) category's main topic (P301) food bank (Q113603) / of (P642) Canada (Q16)

I now see a constraint warning that "food bank should also have the inverse statement topic's main category (P910) Category:Food banks in Canada (Q8463144)", which is clearly nonsense. How widespread is this issue? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:26, 7 February 2020 (UTC)

It's not nonsense. P301 and P910 have one-to-one relation. Use category contains (P4224) instead. --Shinnin (talk) 23:35, 7 February 2020 (UTC)
More likely he wants category combines topics (P971). - Jmabel (talk) 23:42, 7 February 2020 (UTC)
It's nonsense unless you think food bank should have the statement topic's main category (P910) Category:Food banks in Canada (Q8463144). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:57, 7 February 2020 (UTC)
@Pigsonthewing: A fair amount of code, including in particular all the infoboxes on Commons, assume that category's main topic (P301) is a 1-to-1 relation. Please listen to the warnings you're getting, and don't break this. The statement you added does not conform to the expected behaviour for category's main topic (P301). Jheald (talk) 01:02, 8 February 2020 (UTC)
The warning I'm getting says, completely nonsensically, "food bank should have the statement topic's main category (P910) Category:Food banks in Canada (Q8463144)"; what it should apparently say is that the logical "X of Y" statement is not deemed a valid statement on a category, and that an alternative is preferred. Again: How widespread is this issue? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:29, 8 February 2020 (UTC)
Again: this is exactly why Category:Food banks in Canada (Q8463144) category's main topic (P301) food bank (Q113603) / of (P642) Canada (Q16) is the wrong way to do this. It should be and . - Jmabel (talk) 15:38, 8 February 2020 (UTC)
Again: I'm pointing out that the constraint message that is shown when this method is employed tells people to do something nonsensical, rather than how to correct it; and am asking How widespread is this issue? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:48, 8 February 2020 (UTC)

Associated Press articles

When we have a Q item that is an Associated Press article do I add author=Associated Press or is that field only expected to contain a human? I know we have a bot calculating copyright status for books and articles based on the year of death for the human author. --RAN (talk) 03:07, 8 February 2020 (UTC)

I think this should be a human(s) or with an actual value of unknown value if there is no byline. There are some articles written by a machine (e.g. recaps of sporting events) but those are rare and probably not even copyright-able. —Justin (koavf)TCM 03:14, 8 February 2020 (UTC)
  • Where should we store "Associated Press" so that someone can search for all the Associated Press articles? --RAN (talk) 03:40, 8 February 2020 (UTC)
    • i'm no expert but you could list Q40469 as the publisher of the article and then query for all articles published by Q40469. or you could make a new entity representing an "AP article" and have the articles be an instance of that entity. not sure which would be preferred. BrokenSegue (talk) 04:10, 8 February 2020 (UTC)
    • I think that the Associated Press is strictly a "syndicate" and then an individual newspaper or magazine would have a "publisher", so that is how the AP should be stored. —Justin (koavf)TCM 04:50, 8 February 2020 (UTC)
I agree that the local paper is the publisher, so that leaves, temporarily, that the author is the Associated Press. We do something similar for studio photographs. Sometimes the creator is a person, and sometimes it is a photo studio. --RAN (talk) 06:10, 8 February 2020 (UTC)
No, but that's not what is happening here: A Peanuts comic strip published in the Indianapolis Star was not "authored" by United Feature Syndicate. It was authored by Charles M. Schultz and syndicated or distributed by United Feature Syndicate. These are totally different roles. —Justin (koavf)TCM 06:17, 8 February 2020 (UTC)
They have two roles, Associated Press reporters author news reports and they are sent by telegraph and later teletype to subscribing newspapers. You appear to be saying that they may act as a distributor for existing news stories authored by third parties. In your case above we know that Charles M. Schultz is the creator, while AP employees are anonymous, more similar to the photo studio employees that take images anonymously. --RAN (talk) 07:52, 8 February 2020 (UTC)
Yes, they have different roles but just because we know who Charles Schultz is and we don't know who wrote a particular AP piece doesn't change the fact that AP is just a company that distributes things, they are not an author or a photographer or an illustrator or one of any other persons who may make a news story. This is an excellent example of when and why we should use unknown value. —Justin (koavf)TCM 08:06, 8 February 2020 (UTC)
So should we use Property:P750 to list Associated Press as the "distributor", like we do for movies? —Scs (talk) 18:13, 8 February 2020 (UTC)
  • Excellent! Problem solved. Thanks. --RAN (talk) 20:47, 8 February 2020 (UTC)

Add dissolved, abolished or demolished (P576) or similar to all former building or structure (Q19860854)?

Is it a good idea? Is it doable by a bot?

Ideally each former building/structure should have a state of conservation (P5816) and a state of use (P5817), but I believe a bot would not be smart enough to guess.

Cheers! Syced (talk) 11:32, 8 February 2020 (UTC)

  • A former airport isn't necessarily demolished. --- Jura 11:36, 8 February 2020 (UTC)
    • It is dissolved, so it is dissolved, abolished or demolished (P576), though. Syced (talk) 12:00, 8 February 2020 (UTC)
      • I think "dissolved" is meant for organizations. There is an closing date property. --- Jura 12:07, 8 February 2020 (UTC)
      • If something can be "dissolved but not demolished" (think of all our conflations of museums, hospitals, schools and so on, with their buildings, for example), why do we only have one property, and not two? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:33, 8 February 2020 (UTC)
  • It would just be an exercise in garbage in, garbage out. There's absolutely no certainty about how users are using the 'former' values. Getting bots to make sweeping inferences is not a good substitute for diligent curation. --Tagishsimon (talk) 11:43, 8 February 2020 (UTC)
    • "There's absolutely no certainty about how users are using the former values" That's worrying! I guess that's why this is gradually being replaced with more specific properties. Thanks for the feedback! Syced (talk) 12:00, 8 February 2020 (UTC)
Next question, does anyone disagree that former administrative territorial entity (Q19953632) should only be applied to territorial entities that have been dissolved? Thanks! Syced (talk) 12:00, 8 February 2020 (UTC)
Partly yes, partly this value should be burnt with fire. It certainly should not be applied to a current administrative entity. It arguably should not be applied ot anything. An e.g. council which has demised as a result of local government reorganisation, is a council which ideally has, at least, an end date. P31=Council plus End Date = all you need. It was a council. It was never a former council. Much the same holds for all 'former *' values. --Tagishsimon (talk) 12:17, 8 February 2020 (UTC)

Blocked for edits

Can someone explain to me why my bot, insclusive PAWS, is constantly blocked for reading and writing, while [bots] are making several edits per minute in the same time? Edoderoo (talk) 16:19, 8 February 2020 (UTC)

Could you skip these items until the site is more stable? --- Jura 16:27, 8 February 2020 (UTC)
Hmm, the last time the site was stable, Johannes Paulus was still our pope. Your answer is anyways not an anwer to my question. Why are my scripts blocked, while others keep on adding items at a higher rate then I can even add descriptions to them? Edoderoo (talk) 16:49, 8 February 2020 (UTC)
Adding an item of a scholary article seems to be clearly more valuable then adding a Dutch name to an item of an scholary article. Given that the amount of edits the query service can do is limited, it's easy to understand why someone might prefer limiting your bot. That said, if you want to hear something concrete when you ask about your bot how about adding a link to your bot to your userpage or linking it in a discussion like this? ChristianKl❫ 19:21, 8 February 2020 (UTC)
It seems that creating an item for a scholarly article is half as expensive as adding a "." to one its descriptions. Isn't it? --- Jura 19:30, 8 February 2020 (UTC)
  • All bots should obey the "maxlag" constraint that blocks edits when maxlag is over 5. PWB also stops reading at high maxlag, but I'm not sure that was the intended behavior, I think that is being reviewed. In the meantime, please report misbehaving bots to the Admin noticeboard. ArthurPSmith (talk) 20:39, 8 February 2020 (UTC)

Fiction database IDs for real items?

Is it appropriate to add IDs from databases devoted to fiction franchises like Star Trek to real items like Andromeda Galaxy (Q2469)? An example of such an edit may be found here. Jc3s5h (talk) 16:36, 8 February 2020 (UTC)

What alternative do you suggest? --- Jura 16:38, 8 February 2020 (UTC)
For fictionalised people it is important to have a separate item that is different from the item for the actual person. For fictionalised things or places... possibly not? It depends whether there are statements to be made about the thing that will be true only of the fictionalised item. A property like place of birth (P19) comes to mind. Does it matter if a fictionalised character has for their place of birth a real city? Is that going to create an unpleasant 'gotcha' for unwary query writers asking "who was born here?". Or, is it better to include fictional Dubliners by default, if the query writer does not include ?item wdt:P31 wd:Q5? There's a balance of different considerations to take into account here, I would say. Jheald (talk) 16:50, 8 February 2020 (UTC)
Imagine in some fiction the Andromeda Galaxy has not exactly the same features/properties as the real one. Then someone may want to add those datas who obviously are not constitent with the datas of the real one. It needs it own item then … now imagine in this fiction some statements uses the real item, other the fiction one … this would be a mess. author  TomT0m / talk page 17:09, 8 February 2020 (UTC)
I’ll even add, because I read in some report about constraints that items for biographies or something could be a problem for constraints, and needs sometimes to add exceptions to constraint violations (see Wikidata:2020_report_on_Property_constraints#Fictional_entities), that this may even be done even for biographies who may be romanced for example. author  TomT0m / talk page 17:18, 8 February 2020 (UTC)
We probably want a fictional andromeda galaxy in star trek, fictional analog of (P1074) View with SQID the real one. That way there is little chances results about Star Trek pollutes the real datas about Andromeda Galaxy. I think this is a sane thing to systematically do that, this avoids to systematically have to care about filtering out fiction datas in any query. Data about fiction can pop up any time in any item … This could break perfectly good queries or infoboxes at any time. author  TomT0m / talk page 17:02, 8 February 2020 (UTC)

Jura asks what alternative I suggest. I suggest that fiction, and databases devoted to fiction, will routinely mention vast numbers of real people and places, and not attribute to these real items any properties that are significantly different than the true properties. In general I don't think these occurrences are notable enough to make any mention of them in the item about the real item. One might make an exception if the mention in fiction affected the real item, for example, if a real given name (Q202444) suddenly became more frequently given to babies because it was the name of an important, popular, character. Jc3s5h (talk) 17:44, 8 February 2020 (UTC)

Note that the solution of creating fictional items linked to the real one totally allows to retrieve the information of the real one if it’s not different for the fiction item. This does not require to duplicate information, therefore. So it’s not a lot of work to create the fiction item anyway. author  TomT0m / talk page 20:17, 8 February 2020 (UTC)
  • The basic function of identifier statements on items is to find the same concept in other databases. It seems to me that this is exactly what you are describing. What content the database includes wouldn't really matter. Obviously, if the database didn't have any content, I don't see why we added the identifier in the first place. --- Jura 20:23, 8 February 2020 (UTC)


We have Wikidata property for an identifier that does not imply notability (Q62589320) used for Findagrave. Is it being used to say "does not imply notability" for Wikidata or English Wikipedia? I would say for English Wikipedia it "does not imply notability". I see no restrictions in our basic tenets that would prevent adding any/all entry/entries in Findagrave to Wikidata. The thing preventing us is no one wants to do the work. Am I right or wrong? --RAN (talk) 02:16, 4 February 2020 (UTC)

  • The item does exist to talk about Wikidata notability. Adding all entries would be the job of a bot and we do have a policy for bot approvals and have discussions around whether or not we want to import large data sets. In contrast to that Wikidata property for an identifier that does not imply notability (Q62589320) is not an item which has meaning that's incoded in our policies. When it comes to individual items from FindAGrave, if someone contests them they go to requests for deletions and we will have a discussion on requests for deletions whether or not we think the item should be in Wikidata. Afterwards someone might update a Wikidata property for an identifier that does not imply notability (Q62589320) status, so that it's easier for people who don't follow discussions on requests for deletions to know what we actually delete, but there's no policy that actually makes Wikidata property for an identifier that does not imply notability (Q62589320) guiding discussions on requests for deletions. ChristianKl❫ 08:38, 4 February 2020 (UTC)
  • Wikidata is not a general genealogy site. If a majority of entries corresponded to people we already had an entry for, or perhaps separated by one degree of separation, then a broad import might be acceptable. But that is simply not the case for Find-a-grave. Jheald (talk) 20:34, 4 February 2020 (UTC)
  • This describes Wikidata's position (though tbh a lot of the ones with "implies notability" need a cleanup). It does not mean FindAGrave items are "not notable", but rather that the site itself doesn't automatically imply notability as seen by the policy. By comparison, an item with a property for something like Oxford Dictionary of National Biography ID (P1415) is almost certain to be considered notable, because the identifier points to a significant scholarly source. Andrew Gray (talk) 21:07, 4 February 2020 (UTC)
@Andrew Gray: What is "the policy", can you quote it please! Current policy reads notable if: "It refers to an instance of a clearly identifiable conceptual or material entity. The entity must be notable, in the sense that it can be described using serious and publicly available references." Are we saying that Findagrave is frivolous, and not serious? Also there is no "one degree of separation" rule! We have entries for over 10 generations of US presidential families. ChristianKl's arguments are sound, in that we have not all agreed yet to upload that particular data set, but I see no Wikidata:notability rule preventing us from doing that. Large data sets need community approval, Wikidata is already so large that we are experiencing computational constraints. Searches already timeout because the data set is too large, I can no longer search for people who died before they were born to look for errors. My summary: the wording: "Does not imply notability" is incorrect, "not yet approved for bot upload" would be the proper wording. Or "does not imply notability for Wikipedia" would also be correct. --RAN (talk) 13:18, 5 February 2020 (UTC)

If we have serious doubts about Wikipedia itself being citable in this respect, then I would think of Findagrave as one level below that. Do people see that differently? - Jmabel (talk) 16:26, 5 February 2020 (UTC)

Is your concern reliability or notability? You can visit the page of, as of yet, uncorrected VIAF errors. You can look at the >1,000 fixes I made of people who were dead before they were born or lived more than 120 years who are not flagged as super-centenarians here at Wikidata. Most were typos here or typos at the source we imported the data from. Other errors were from conflating people of the same name and importing a birth or death date that was wildly incorrect. You would have to calculate error rates for each data set and compare to see which was the most reliable: Findagrave or Wikidata or VIAF. For instance has now flagged all people who died before they were born, married before age 14, and lived over 120 years as potential errors, so these errors can be corrected. --RAN (talk) 18:02, 5 February 2020 (UTC)
  • The most pressing concern about adding any new large data sets is our current computational throughput, not notability. --RAN (talk) 08:59, 9 February 2020 (UTC)


At 80 years after author's death (Q29940641) there is a curious entry for start_date=31 December 1986. I am not a math expert but I think 2020-1986=34, am I missing something? Shouldn't it be 2020-80=1940 as the latest date for a death? ... and there is no point putting a fixed value in the data field, it will increment each year. --RAN (talk) 21:16, 7 February 2020 (UTC)

  • You could ping the user who added it, but it might mean that this was applicable to Spain until that date. --- Jura 21:57, 7 February 2020 (UTC)
I haven't found at what point it was added, there is no clear note in the edit summary that I can find. --RAN (talk) 03:01, 8 February 2020 (UTC)
I suppose the statement means that copyright in Spain last until 80 years after the death of the author if the author died at latest 31 December 1986. (If the author died at a later date, the copyright will last until 70 years after the death). It you compare with c:Commons:Copyright rules by territory/Spain#General rules, you will notice that according to Commons the 80 years protection is valid for authors who died before 7 December 1987, thus almost a year longer than stated here. --Dipsacus fullonum (talk) 08:14, 8 February 2020 (UTC)
That makes sense, we always have trouble force fitting important information within the constraints of our field names. Better confusing information rather than then absence of information. --RAN (talk) 07:23, 9 February 2020 (UTC)

How to say "meets criteria" or "doesn't meet criteria"? and "How?"

For example, Dodgson's method (Q5287927) meets or fulfills Condorcet criterion (Q831304) while Borda count (Q576578) does not. Currently Dodgson's method (Q5287927) is listed as "instance of" Condorcet criterion (Q831304) which doesn't seem right.

(Also, Ranked pairs (Q1686629) is an "instance of" voting system (Q182985), which seems right, while

Borda count (Q576578) is a "subclass of" ranked voting (Q1078003), which I'm not sure is right. I think it should be an instance of both?

Ranked voting is a subclass of voting system, and Borda count and Ranked pairs are instances of both classes?) Omegatron (talk) 02:49, 9 February 2020 (UTC)

@Omegatron:This actually came up on the project chat last week. While the discussion didn't come to a firm conclusion, using complies with (P5009) seems to be the most specific solution at the moment. It also seems that there are multiple kinds of Borda count (Q576578), so using subclass of (P279) is reasonable. Vahurzpu (talk) 03:07, 9 February 2020 (UTC)
@Vahurzpu: "complies with" sounds good, but how would you say "doesn't comply with"?
Multiple kinds of Borda count? You mean these variations? Omegatron (talk) 03:41, 9 February 2020 (UTC)
Accordingly, I added "How?" to the section header. --- Jura 13:49, 9 February 2020 (UTC)

Items refusing to appear

Does anyone else deal with this problem? It's pretty annoying constantly having to refresh a page just to get the items to appear. --Trade (talk) 00:46, 5 February 2020 (UTC)

Yes! Has always happened intermittently. Add a field, hit publish and it disappears until refresh. --RAN (talk) 13:14, 5 February 2020 (UTC)
Most of the times I can't simply "hit publish". I have to focus somewhere else, then return focus to edit field and then I can hit publish. Strange behavior. --Infovarius (talk) 00:03, 11 February 2020 (UTC)

Technical problem with the Living People policy

Hi, on Sasha Grey (Q2709) I removed mass (P2067) as ancient and undesirable per policy, cf. Talk:Q2709 here and on enwiki. Wikilinks to the 2019 history about the corresponding weight in an infobox on demand, meanwhile {{Infobox person}} does not more support this on enwiki.
However, ruwiki still supports it, and of course folks try to import the "missing" statement here: How is that supposed to be handled? – 09:44, 6 February 2020 (UTC)

  • The policy doesn't consider the property undesirable per se. I think there's a good chance that the test of "widespread public knowledge or are openly supplied by the individual themselves" can be passed here. As far as automatically importing statements subject to the living people policy, the living people policy has a section on how to deal with the relevant bot approvals. If a bot imports statements from Wikipedia's without honoring that policy, raise the issue with the bot operator and if that doesn't work here. ChristianKl❫ 16:35, 6 February 2020 (UTC)
  • I can see in ru:Саша Грей mass=55kg with reference to article in Rolling Stone. I have no idea whenever this source is ancient or not, but I'm failed to see how does "ancient" or presence/absence of "weight" is en-infobox has anything to do with wikidata. The mass statement neither violates ru:WP:BLP not WD:BLP Ghuron (talk) 15:23, 7 February 2020 (UTC)
    WD:BLP—nice shortcut, now hoping for a Russian translation—wikilinks WikiProject Properties/Wikidata properties that may violate privacy and WikiProject Properties/Wikidata properties likely to be challenged.
    The latter contains mass (P2067). In sports (example) a moving target such as "mass" could make sense with a point in time qualifier, but for actresses—note the suspicious absence of actors—I call BS. As far as I can tell it the over ten years old sources for 55kg are based on a scanned 2006 driver licence. – 16:23, 7 February 2020 (UTC)
    Yes, according to WD:BLP P2067 statements should be supported by a reliable public source and yes, Rolling Stone article that states 110 pounds is exactly that type of source (and has nothing to do with driving license). May be it would make sense to add point in time (P585) qualifier. Ghuron (talk) 09:15, 8 February 2020 (UTC)
    You are right, mass (P2067) is not in our highest privacy status but in the "likely to be challenged"-status and that status seems to be supported by that source. ChristianKl❫ 15:19, 10 February 2020 (UTC)
If the source is accurate, it should be restored. It is physics, not "ancient and undesirable" information. And the decision to remove body masses appears to be one person's opinion here at Wikidata based on a decision at English Wikipedia. We remove living people information to prevent identity theft and protect minors, no one uses body mass as a personal identifier since it changes from year to year. --RAN (talk) 03:46, 8 February 2020 (UTC)
Only me happens often, but I wasn't involved in anything related to WD:BLP and its two lists. – 08:41, 8 February 2020 (UTC)
  • One should not romove info from wikiData because of some local rules. 11:15, 8 February 2020 (UTC)
    Ignore local rules, enwiki was only an example. This is about WD:BLP + related global rules on Meta, if there are any, or in plain DEnglish sexism. – 16:09, 9 February 2020 (UTC)

Wikidata step-by-step

Is there some sort of a guide that suggests things to try for new contributors ?

e.g. if someone decides to invest some time every week they could go through a list of things to do (by actually editing). We currently have "tours", but they don't seem to actually lead to edits.

Some samples (each done in a separate, short session):

  1. find a three missing facts to add to Wikidata. (maybe from some generated list)
  2. create three new items for Wikipedia article
  3. fix three labels, etc.
  4. create a series of items on some topic manually
  5. create a series of items on some topic with Petscan
  6. create a series of items on some topic with Quickstatements

Not sure about the sequence, but the general idea is to get to know various aspects if one can't invest too much time at once.

Some thought is needed that a few users don't exhaust all samples. --- Jura 13:58, 9 February 2020 (UTC)

  • There are many different ways to contribute to Wikidata. Different people are motivated to do different tasks. I don't think many people start out with an intention of investing time X every week into Wikidata. ChristianKl❫ 15:15, 9 February 2020 (UTC)
    • Well, if you decide to contribute by maintaining, e.g. the items about NASA, you'd probably want to know how other aspects work .. --- Jura 16:16, 9 February 2020 (UTC)
    That's a fair point. I think it's worthwhile if such material is written with a good mental model of newcomers. ChristianKl❫ 15:20, 10 February 2020 (UTC)
  • Similar features has been deployed to some Wikipedias: mw:Growth/Personalized first day/Newcomer tasks. --Matěj Suchánek (talk) 10:49, 10 February 2020 (UTC)
    • Interesting .. seems to be a whole ocean of things. I need to take a closer look. I hadn't exclusively in mind newcomers strictly speaking, but users who want to explore additional features, are interested in trying, but not necessarily read too much documentation beforehand and search for uses .. --- Jura 21:05, 10 February 2020 (UTC)

does Western Punjabi Wikipedia pnb.* پنجابی have a higher than average number of trolls?

I don't speak it, but a lot of things look suspect and when I look them up they make even less sense. I'm learning Urdu and I occasionally look up Western Punjabi to compare, since there is a lot of overlap. I find for most languages the matching wiki page is a good quick way to check i'm getting the right concept, and not mixing it up with a synonym. For most languages this works well and confirms or clarifies what I find in other sources, but for پن٘جابی I find the result often looks a bit off, and when I look it up it seems to be some sort of joke. But for Western Punjabi there aren't many good quality accessible resources to compare to, so am i just getting muddled or does have a higher than average troll problem? Is there a way to approach it if i find what looks like a troll but i don't know how to fix it? Is there a way to flag things for a fluent speaker to check? Irtapil (talk) 01:29, 10 February 2020 (UTC)

I cannot help with the OP's problem, but I would love to have a way to flag things to be checked by a fluent speaker of a specific language. Bovlb (talk) 16:28, 10 February 2020 (UTC)

Interest in This Tool?

I made a browser extension that rips structured data from Google's knowledge panel search results and posts the data to wikidata. I'm aware of concerns about sourcing data from Google (it can create cycles of information given they source from us). But it's a good semi-automated tool that currently only pulls out:

  • Social media links
  • Freebase/Google KB ids

The changes it makes to wikidata look like this. The results need to be hand audited for accuracy since Google's data is noisy but it's much faster than doing it by hand.

My questions are:

  • Is there any interest in my open sourcing/distributing this tool? Doing so would require some effort to make it more user-friendly.
  • Are there thoughts about pulling more structured data from these panels (e.g. inception dates).

Also, is this the most active place to discuss the project or is there a mailing list/IRC that would be better?

BrokenSegue (talk) 22:17, 5 February 2020 (UTC)

I'm well aware about the concerns regardit sourcing data from Google but i think we should be very much safe extracting social media links. --Trade (talk) 22:30, 5 February 2020 (UTC)
Doesn't Google rip data from Wikidata? Seems likely to create circular citations, and amplify "citogenesis" , corrupting verifiability of the origins of data. -Animalparty (talk) 23:24, 5 February 2020 (UTC)
yes, but the tool only populates fields that are currently empty in wikidata and I hand audit the results (it's not fully automated and cannot be fully automated). examining the data it's clear Google is sourcing the social media information primarily from somewhere else. and the population of the freebase/google kb info seems unproblematic since they can't be sourcing that from us. BrokenSegue (talk) 00:23, 6 February 2020 (UTC)
This sounds like the tool does populate fields where Google mirrored our data and then we deleted our data. I think the tool is fine as long as there's hand auditing of the results. ChristianKl❫ 16:45, 6 February 2020 (UTC)
Yeah that is a reasonable concern though I think that's a minority of cases and yeah i'm hand auditing the results. BrokenSegue (talk) 17:05, 6 February 2020 (UTC)
@BrokenSegue: Is it possible to have these edits include a reference? You have "scraped data from google" in the edit summary, but it would be really helpful in the long run if you could add some kind of reference like stated in (P248):Knowledge Graph (Q648625) (or possibly a more appropriate item) Andrew Gray (talk) 20:12, 10 February 2020 (UTC)
@Andrew Gray: good idea. I added that feature. you can see it here BrokenSegue (talk) 04:44, 11 February 2020 (UTC)

Q56299775 vs Q1122799

Comstock laws (Q56299775) vs Comstock laws (Q1122799). How should we handle this, should the two French Wikipedia articles be merged? We have the various laws passed under one article in the English Wikipedia but there are two articles in the French Wikipedia, one is on a single act of Congress and the the other on the various laws passed. --RAN (talk) 01:28, 10 February 2020 (UTC)

Although one of the French articles - - leaves something to be desired, it appears to be about the set of laws, whilst seems to be about the 1873 'parent' act. My view: swap the FR sitelinks around, clarify the label and description of Q56299775 - it's an act, not laws - and then use has part / part of to link the two wikidata items. --Tagishsimon (talk) 09:11, 10 February 2020 (UTC)
  • @Tagishsimon: Go for it, I think you have the best grasp! --RAN (talk) 04:46, 11 February 2020 (UTC)

Label/alias in mul

Is it possible to add label and aliases with 'mul' as a language? In some cases there are names, symbols etc. that are correct for many languages (like symbols of the elements for example) and it's quite tedious to add such things to many languages as having the same aliases in many languages does not help in searching etc. Wostr (talk) 15:09, 22 January 2020 (UTC)

Slightly OT, years ago I was on the language tag review and related RFC-update lists, exploring mul + und issues (after en-GB surviced a basic plausibility check) can be fascinating. On YouTube I gave up to find a tag for Kobaian. Is und-Latn (example) a thing? After all you would at least know the script. – 02:16, 28 January 2020 (UTC)
Previous discussions: Wikidata:Project_chat/Archive/2013/07#Global_labels, Wikidata:Project_chat/Archive/2019/06#Multilanguage_label has ended with nothing. --Infovarius (talk) 15:57, 29 January 2020 (UTC)
Given that there's support and no pushback, maybe we need a Phabricator ticket? ChristianKl❫ 08:52, 3 February 2020 (UTC)
Please make sure to get this right per RFC 5646 section 4.1 clause 5.
I just saw that enwiki has mis for Kobaian, and zxx non-linguistic can be also interesting. In one of the two archived discussions wikilinked above you had a mul-Latn, that's not the same as und-Latn, there is a SHOULD NOT for mul in BCP 47 = RFC 5646. – 05:17, 4 February 2020 (UTC)
There's a SHOULD NOT but I don't think it applies to our planned usage. "This subtag SHOULD NOT be used when a list of languages or individual tags for each content element can be used instead." A list of languages is not possible for our use-case. We actually mean all languages and not a subset that could be expressed by a list. I think we comply with the recommendations in that document. If you think we don't please point to the specific portion that you think get violated by the planned usage.
zxx seems to be a good idea for special unicode characters. ChristianKl❫ 17:43, 10 February 2020 (UTC)
^.^b All fine if you checked it, the experts on phab: can sort it out. @Cbrescia: My comment about shp in January was misleading, if MediaWiki now tries to cover all plausible language tags. – 11:36, 11 February 2020 (UTC)

RAL Colors

I was working on colors to improve them and I found that there are a lot of RAL colors instance of RAL classic color (Q17421658). I would like to suggest a batch change and an addition of a property.

First I think it would be better to have the name of the RAL colors as label and the RAL 0000 number part as an alias. Additionaly I think it would be create to add the RAL identifier as an external identifier, so we would need a property RAL identifier or similar.

Any ideas, thought for the further process on this?

--DaSch (talk) 19:33, 5 February 2020 (UTC)

Link from Commons to Wikidata for entries on images

At Wedding Deferred, Commits Suicide (Q84572479), for instance, we have a Wikidata entry for a news article hosted at Commons. I can click on the image to get to Commons ... but if I was at Commons how would I know there was a Wikidata entry for this image? --RAN (talk) 06:13, 8 February 2020 (UTC)

I added a P6243 statement on its structured data, does that look right? Ghouston (talk) 06:23, 8 February 2020 (UTC)
Or you can check File usage on other wikis section on commons. ‐‐1997kB (talk) 06:48, 8 February 2020 (UTC)
Ok, both are clever, thanks! Now I see the "File usage on other wikis". --RAN (talk) 07:38, 8 February 2020 (UTC)
I wonder why we have a massive, and monolithic, quotation or excerpt (P7081) value on that item, rather than the whole (short) article, which is out of copyright, being transcribed on Wikisource, where it can enjoy the use of heading and paragraph markup? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:39, 8 February 2020 (UTC)
Can't we have both? I have had items deleted at Wikisource as out of scope in the past. Quote at Wikidata is limited to 1,500 characters, so not "massive, and monolithic". Sometimes redundancy is good since we have no control over the deletion policies at other projects. Wikipedia has purges of classes of people and things, and the information is only preserved because we have an entry for them here. At one time there was a purge of high schools at Wikipedia and more recently a purge of mayors and county administrators. They even tried deleting the entries for those people/things here at Wikidata. So ... redundancy can be beneficial. --RAN (talk) 17:38, 8 February 2020 (UTC)
@Richard Arthur Norton (1958- ): Storing data in Wikidata statements is more expensive then storing it elsewhere. The query service has limited capacity and that should encourage us to not store longer texts in statements, so that we can have more items/statements in the query service. ChristianKl❫ 08:53, 11 February 2020 (UTC)

time-varying P31

Quite recently, the two small Indian union territories of Dadra and Nagar Haveli (Q46107) and Daman and Diu (Q66710) merged to form Dadra and Nagar Haveli and Daman and Diu (Q77997266).

If the Wikipedia article is to be believed, the new territory is actually ((Dadra and Nagar Haveli) and Daman and Diu). That is, although the former union territory of Daman and Diu is no more, the former union territory of Dadra and Nagar Haveli is now one of the three districts making up Dadra and Nagar Haveli and Daman and Diu.

I have indicated this at Dadra and Nagar Haveli (Q46107) with two different values for instance of (P31), qualified by start time (P580) and end time (P582) in the obvious way. I believe this is a good way to do it.

However, based on discussions I sometimes see here, I suspect there might be an argument for creating two distinct Q-entities for the former union territory as distinct from the current district.

So, if anyone is interested, feel free to correct the tagging at Q77997266 and related entities if I got it wrong, or lobby for creating the second, distinct entity for Dadra and Nagar Haveli. —Scs (talk) 21:05, 9 February 2020 (UTC)

If the new territory is the legal successor of one of the two previous - i.e. it was one territory added to another - then keeping the old and new in one item would be OK. But if a legally new territory was formed out of the two previous, then it must be a new item. Its a grey area which other external identifiers for administrative units handle differently, some assign a new identifiers at every merge/split of an administration units, others keep them at least when the name of the successor stays same. But probably the Wikipedia links will require two items anyway, as there will be at least one Wikipedia which chooses to have different articles about them. Ahoerstemeier (talk) 09:33, 10 February 2020 (UTC)
In this case there are already separate items, but I think the difficulty is that Dadra and Nagar Haveli (Q46107) is being used to represent different kinds of entities before and after the change, union territory of India (Q467745) vs district of India (Q1149652). Since they are different kinds of administrative entities, should they have different items, even though they have the same name and possibly the same territory? Ghouston (talk) 09:58, 10 February 2020 (UTC)
Yes, that's precisely it: Same name and (AIUI) same territory, but a different "kind of entity".
And although I said above that "I believe this is a good way to do it", here's a pretty strong counterargument: Do we expect someone running a query for "all Indian states and territories (that is, someone trying to check or recreate one of the lists at w:Administrative divisions of India) to explicitly qualify their wdt:P31 query to make sure they get only entities that are currently instances of whatever? —Scs (talk) 11:33, 11 February 2020 (UTC)

Wikidata weekly summary #402

We might expose a subgraph with only truthy statements. Or have language specific graphs, with only language specific labels.

It's not clear to me why the SPARQL server needs to know anything about labels or descriptions in the first place. I don't think we would lose much relevant functionality when labels and descriptions would be moved to a different server. On a related note I think that the main venue for communicating information about Wikidata should be Wikidata and if there's a monthly edition of the state of the query service it would be great to have it on Wikidata (it's okay to additionally email it). ChristianKl❫ 08:15, 11 February 2020 (UTC)

Merging 2 items

Could someone merge Q84564408 and Q55771445 ? Same individual. Txs--René La contemporaine (talk) 10:53, 11 February 2020 (UTC)

@René La contemporaine: see Help:Merge. --- Jura 11:15, 11 February 2020 (UTC)
Jura Thank you.--René La contemporaine (talk) 13:24, 11 February 2020 (UTC)

Reduced loading times for Wikidata/Wikimedia Commons

Hello all, While cleaning (reviewing and rewriting) the code of Wikidata and Wikimedia Commons backend in October 2019, The Wikidata team at WMDE together with WMF worked on reducing the loading time of pages. We managed to reduce the loading time of every Wikidata page by about 0.1-0.2 seconds. This is due to a reduction of the modules (sets of code responsible for a certain function) that need to be loaded every time a page is opened by someone. Instead of 260 modules, which needed to be loaded before, only 85 modules need to be loaded now when the page is called. By doing so, it is easier to load Wikidata pages for people who only have a slow internet connection.

Alt text
Size decrease of the initialization loader on Wikidata pages (on Grafana).

Reducing the amount of modules called when loading the page equals a reduction of about 130 GB of network traffic for all users every day, or 47TB per year. The reduction of network traffic translates into a reduction of electricity use, thus, this change is also good for the environment. Additionally, the interdependencies between the modules were reduced from 4MB to 1MB, which improved the loading time per page as well.

Many thanks to everyone involved in this improvement! If you want to get more details about the actions we performed, you can have a look at the Phabricator board.

If you are developing scripts or tools on top of the Wikidata UI, some documentation will walk you through the architecture of RessourceLoader, what’s page load performance and how to create module bundles with ResourceLoader.

For further questions or feedback, feel free to contact us on this page.

Cheers, for the Wikidata team: Max Klemm (WMDE) (talk) 11:05, 11 February 2020 (UTC)

Items failed to merge because two items contain the same sitelink

Q10813613 was merged to Category:Executed Qin dynasty people (Q24918671), but could not be redirected, because Category:People executed by the Qin dynasty (Q7016834) is linked to the same Chinese Wikipedia page as Q24918671. I don't know which is the correct item for the sitelink, as putting the description into Google Translate suggests Q7016834, but based on the categories Q24918671 looks more likely. Peter James (talk) 12:20, 11 February 2020 (UTC)

The two categories are distinct on both enwiki and zhwiki. Li Si seems to be in one enwiki category but not in the other. Before we can merge on Wikidata, the enwiki and zhwiki would have to merge their categories. ChristianKl❫ 14:03, 11 February 2020 (UTC)
The merged items are Q10813613 and Q24918671, not Q7016834. The zhwiki links on Q24918671 and Q7016834 are identical, which makes it impossible to edit the items unless one of the links is removed, but I don't want to remove a correct link and keep an incorrect one. Peter James (talk) 14:59, 11 February 2020 (UTC)
It's not generally possible to put the same sitelink on multiple items anyway. I assume it was only a bug that allowed it to happen. I'd just remove it from one item and then complete the merge. Ghouston (talk) 22:02, 11 February 2020 (UTC)