Shortcuts: WD:PC, WD:CHAT, WD:?

Wikidata:Project chat

From Wikidata
(Redirected from Wikidata:PC)
Jump to navigation Jump to search

Wikidata project chat
A place to discuss any and all aspects of Wikidata: the project itself, policy and proposals, individual data items, technical issues, etc.

Please use {{Q}} or {{P}} the first time you mention an item or property, respectively.
Other places to find help

On this page, old discussions are archived after 7 days. An overview of all archives can be found at this page's archive index. The current archive is located at 2021/04.

How to add an identifier to a wikidata page[edit]

Hello everyone. I'm a relatively new user and am asking for help because I can't figure out how to add identifiers to data pages. Any help would be greatly appreciated.  – The preceding unsigned comment was added by PunkWillNeverDie44 (talk • contribs).

How to express nationality (Q231002) ?[edit]

On cs.wikisource we want to categorise via Wikidata authors according their nationality - so authors living in Germany and writing German are German authors, authors from France writting French are French authors etc. But there is problem with nationality (Q231002).

If we use ethnic group (P172) (czech name of this property is nationality), some people says there is no french ethnicity, there is no swiss ethnicity etc.

country of citizenship (P27) is too fragmented and problematic for many Czech authors - they were citizens of Austria-Hungary (Q28513) or Cisleithania (Q533534), but were Czechs (Q170217). (and vice versa e.g. for Polish authors).

There was proposal for cultural identity, but was rejected.

Is some useful way how to express that Jan Neruda (Q156321) and Václav Havel (Q36233) were Czechs (Q170217), Jack London (Q45765) and Thomas Jefferson (Q11812) were Americans (Q846570) or that Napoleon (Q517) and Paul Verlaine (Q755) were French (Q121842)? JAn Dudík (talk) 12:43, 1 April 2021 (UTC)

It's ok to make the proposal again, if you have a strong argument for it. Out of courtesy, link to the old proposal from the new one and ping people involved in discussion before. ArthurPSmith (talk) 20:07, 1 April 2021 (UTC)
You didn’t say what is the problem with nationality (Q231002); I presume you mean the quality you’re trying to record is not at all a legal concept. What about nation (Q6266) and national identity (Q1880695)? —Michael Z. 00:13, 2 April 2021 (UTC)
@JAn Dudík: For me there are two ways in which it can be regulated. In your example of Jan Neruda (Q156321), one possibility is already being implemented. There you will find the statement Jan Neruda (Q156321) ethnic group (P172) Czechs (Q170217) and that means that although he has the nationality Austrian Empire (Q131964) he is actually Czech. This implementation can, however, be rated as controversial because the property ethnic group (P172) itself is controversial. The second possibility, which one can do, can be seen with several Germans. As an example I would take Ludwig van Beethoven. For its data object there is the statement Ludwig van Beethoven (Q255) country of citizenship (P27) Germany (Q183) / reason for deprecation (P2241) anachronism (Q189203). Here you have to note that you put this statement on the disapproved rank. --Gymnicus (talk) 09:36, 2 April 2021 (UTC)
We never really tackled the nationalism problem. I recall extremely lengthy discussions going nowhere and everywhere. Just doing a property proposal will probably fail. I think last time I suggested making an overview of example cases of (historic) people who we want to link to current countries. For example Frans Hals (Q167654) is Dutch and Peter Paul Rubens (Q5599) is Flemish. Will need a significant effort to properly model this (if that's even possible). Multichill (talk) 17:12, 2 April 2021 (UTC)
I would recommend NOT to add any nationality. For example people living in Crimea Crimean Peninsula (Q7835) ... are they now Russians, Ukrainians, or Crimeans, or something else? People living in the non-state Tibet, are they really Chinese? Drop it completely, this is the only solution. Taylor 49 (talk) 19:45, 2 April 2021 (UTC)
On the other side Czech nationality/ethnicity is non controversial for most "Czech" authors 100 years ago (which are on Wikisource). JAn Dudík (talk) 17:44, 4 April 2021 (UTC)
I agree with Taylor 49 - nationality and ethnicity are ill-defined (not only for people whose parents came from different countries), citizenship may change over a person's lifetime. Best classify authors by the language(s) they write/wrote in and maybe by their birth country or birthplace, and by place(s) of residence. Don't use nationality at all, it only leads to fruitless arguments. --Schlosser67 (talk) 08:14, 9 April 2021 (UTC)
If we are concerned with authors, wouldn't native language (P103) and languages spoken, written or signed (P1412) be the relevant properties? Ghouston (talk) 21:27, 2 April 2021 (UTC)
No. languages spoken, written or signed (P1412) - some authors wrote their works in multiple languages. And native language (P103) - some authors have works in their second language. e.g. Karel Klostermann (Q84648) was from german family, wrote his works both in german and czech, but is considered as Czech author. JAn Dudík (talk) 17:44, 4 April 2021 (UTC)
I'd ask why Karel Klostermann (Q84648) is considered a Czech author. Is it because Czech is a "native language?" Native language may be hard to determine, but nationality/ethnicity is even harder. Presumably a person would generally have to know the language associated with a nationality/ethnicity to be considered a member of that nationality/ethnicity. I assume we are considering this concept to be completely independent of citizenship: of course it's possible to be a citizen of a country without knowing any of its languages. Ghouston (talk) 22:47, 4 April 2021 (UTC)
Also, is there anybody who has written books in Czech who wouldn't be considered a Czech author? Ghouston (talk) 23:06, 4 April 2021 (UTC)
There are Czech authors which wrote mainly German or Latin. And also there are Slovak authors (eg Ján Kollár (Q220550)), who wrote their works in Czech.
Karel Klostermann (Q84648) is Czech author, because sources (in WP article is book ISBN 80-200-0469-6) says it.
This question is not about ethnicity or nationality, but how to express, that author XY is considered as ZZan author. When sources says somebody is french author, I want to have property to write it even if his ethnicity was hotentote and his citizenship was French Third Republic (Q70802) or Canada (Q16). JAn Dudík (talk) 05:52, 6 April 2021 (UTC)
So the property would just be set according to what some source has claimed, and it's accepted that ethnicity/nationality has no real definition? I don't even know what my own ethnicity/nationality would be (except taking nationality as a synonym for citizenship). Ghouston (talk) 22:27, 7 April 2021 (UTC)
  • Having a nationality property would provoke edit wars over tens of thousands of items for humans that live in disputed areas like Crimeria. Whether or not Taiwanese is a nationality is likely also very controversial and would likely lead to broad edit wars that potentially go over thousands of items as well. If a source says "John Smith was a Taiwanese author" the source isn't making any statement about whether or not Taiwanese is a nationality, so a source saying that "John Smith is a Czech author" also isn't making any statement about their nationality is a way that doesn't rely on subjective interpretation as long as the author isn't explicitely saying that they believe Czech to be a nationality.
While the desire to have a property is understandle previous discussions at property proposals repeatidly showed that our community thought that the problems outweigh the benefits of having such a property.
As far as ArthurPSmith suggestion of making a new property proposal, I don't think that's a matter of providing new arguments but a matter of actually addressing the problems and thinking through the various problematic edge cases. For a proposal to go anywhere it actually needs to provide answer about how we decide very problematic cases like whether or not Taiwanese is a nationality. Clear policy decisions about such a question in turn risk getting Wikidata censored when they go against government policy in certain regions. ChristianKl❫ 10:14, 14 April 2021 (UTC)

IFSA[edit]

Hello. i wanna create a new language section for IFSA´s wikipedia article but the page don't let me create it. Do I need permission?

(Another) Wikidata Browser Extension[edit]

I'm finishing up a browser extension I've been working on that allows you to view and edit wikidata information from wikipedia. Here's a short demo video showing off basic functionality.

I'm interested in seeing if there's other people potentially interested in using this, hearing more about it or collaborating. It's similar to Wikidata:Tools/Wikidata for Firefox but has different eventual goals, a different user experience (Q1047808) and intends to support all browsers. I've used it to make a bunch of edits and I find it pretty convenient. BrokenSegue (talk) 03:00, 9 April 2021 (UTC)

That looks great! I would love to give it a try, you could be simply reading Wikipedia articles and contributing at the same time. Nice work AntisocialRyan (talk) 02:35, 10 April 2021 (UTC)
@BrokenSegue: Is this available yet to try? AntisocialRyan (talk) 19:18, 15 April 2021 (UTC)
@AntisocialRyan: ah sorry. It's currently awaiting approval by the chrome app store. No idea how long that will take. I could also distribute it for manual install though but there are still some known bugs/quirks. BrokenSegue (talk) 21:55, 15 April 2021 (UTC)
@BrokenSegue: Nice, can't wait! AntisocialRyan (talk) 22:23, 15 April 2021 (UTC)
It looks fantastic. I love the use of color to make it clear what is missing. Where can I read the code and try it out?--So9q (talk) 04:11, 12 April 2021 (UTC)
@So9q: It's currently awaiting approval by the chrome app store. The code is here though if you wanna try to build it. BrokenSegue (talk) 21:55, 15 April 2021 (UTC)

Correcting properties for "music of country" items[edit]

Currently, all data in such items are filled in inconsistently and do not follow a single scheme, like music of the Nordic countries (Q26302245), music of the United Kingdom (Q268673), music of Israel (Q3858). I would like to have a common scheme for marking such items.

1. Such articles should be filled with music by country or region (Q75054287). Should it be instance of (P31) or subclass of (P279)? How legitimate/correct is ever such a group item and should it not be covered by the music with the qualifier "country" or "of" instead?

2. Should such articles be referred to as music genre (Q188451)? This has a basis as they could be considered as meta-genres for regional smaller genres, and most of the current genre uses are referenced by RYM's regional genre tree.

3. Also, it seems that such items should have uniform names, such as "music of Australia", and not "Australian music", or the opposite. Which is preferable?

4. Are there supposed to be other parameters that will apply to all such items?

5. How preferable are the new categories, such as "music by nationality or ethnicity" (as many items refer to music of indigenous people instead of any particular country or region, like Tswana music (Q263524), Romani music (Q1268283), Pueblo music (Q7258480)) or "music by language" (same, we have "music of Spain (Q964987)", "Latin music (Q18345375)", "Category:Spanish-language music (Q6282163)", can't find the articles, not the categories, but I've definitely seen such). Or each of them should be covered by qualifiers - ethnic group and language?

Solidest (talk) 17:00, 10 April 2021 (UTC)

Ok, as there have been no answers so far. Yesterday I completely re-worked all the "music of place" items for European countries (+ some other regions). Here is the scheme I have been working on, which I would like to consolidate:
1. Each item dedicated to "music by country or region" should be filled as instance of (P31): music by country or region (Q75054287). A subclass of (P279) should be filled with parent geographical level, hierarchically like: neighborhood - city - region - country - group of countries - part of continent - continent - world music (Q205049) (aka regional music) or just music (Q638) for music of the Americas (Q6942344) (because of the specificity of the term world music (Q205049)).
2. All such articles should be marked as instance of (P31): music genre (Q188451), but only at the country level and above. The reason for this is a logical and ideal order for cataloguing such items. As well as the mentioned source RateYourMusic, which reproduces this categorization in the best way. For sub-regions within a country, there should be only just music by country or region (Q75054287), without "music genre". However, if there is an entry on RYM for such sub-region terms or any other source indicating that it is a genre, then it should also be marked also as "music genre".
3. All such entries should be named as "music of x".
4. It's probably worth to specify the country (Q6256) separately (or also as a classifier of music by country or region (Q75054287) if this really needed).
5. music by nationality or ethnicity is still preferable and I'll probably create it, which should be used with the ethnic group (Q41710) qualifier, and exist as a parallel category to "music by country or region". Whereas it seems to me that "music by language" is needless yet since it's mostly related to categories without articles.
6. Specifically, I would like to mention the difference between general "music of place" items and "traditional music of place"/"folk music of place". It's worth clarifying because "music of country" items often have descriptions like "overview of musical traditions of country", which may imply that such items dedicated to traditional music only, but this is incorrect. Almost always these are two different terms, which should have two different items, while they are often being incorrectly marked as one item. For example, correct use is music of Bulgaria (Q1003981) and Bulgarian folk music (Q12274232) or music of Romania (Q852210) and Romanian traditional music (Q12736123). Generally, "music of country" = traditional music + classical music + modern genres. So in most cases, the items "music of country" should not have a subclass of (P279): traditional music (Q235858), except when the term itself only applies to traditional music (such as the music of a historic region that no longer exists).
7. Regional genres such as Belgian jazz (Q2473848), African heavy metal (Q17510016), etc should be marked as music genre (Q188451) and music scene (Q28820001), and subclass of (P279): parent genre like jazz (Q8341) / heavy metal (Q38848) + music of place items like music of Belgium (Q2587471) / music of Africa (Q369820). It should not be marked as music by country or region (Q75054287).
Solidest (talk) 11:52, 16 April 2021 (UTC)

How to create a list of examples?[edit]

For example, I would like to create a list of English semordnilap words. The items are examples of such words. So, they are not exactly standalone items.

I assume the item representing such a list should be: instance of Wikimedia list article. What is the preferred statement for items representing examples of words? Usage example?  – The preceding unsigned comment was added by D. Senkyr (talk • contribs) at 18:26, 10 April 2021 (UTC).

Need weekly/daily data to import in Wikidata[edit]

Hello, I am currently developing a tool to update Wikidata https://github.com/LeMyst/WikibaseIntegrator, but I would need data that need to be updated weekly or daily, would anyone have an idea about this kind of data? It's mainly to test my code (under supervision). I currently test it on my own wikibase, but my data are only updated every ~3 months, and because of that the development of my code is slower. Thank you, Myst (talk) 09:41, 12 April 2021 (UTC)

  • You care looking for new items to create (similar to Pi bot) or information to add to existing ones? --- Jura 10:33, 12 April 2021 (UTC)
    • Both, but I think add to existing ones (updating a value) is easier to maintain. Myst (talk) 12:46, 12 April 2021 (UTC)
    • Maybe there are three types:
      - (A) new items to be created for new Wikipedia articles
      - (B) new items to be created from new information in Wikipedia,
      - (C) new facts in Wikipedia to be added as statements to existing items.
      For
      - (A) User:Pi bot does some and, e.g. for people, adds a few additional statements. For many, more statements could probably be added directly. To find them, I suppose you have to go through its contributions or check, e.g. Wikidata:Database reports/without claims by site/enwiki.
      - (B) pages in w:Category:2021 American television seasons get incremental additions of new episodes. If these are reasonably complete and stable, you could use that to create new items for episodes. Sample: Q106426978 from w:Younger_(season_7)#Episodes.
      - (C) In itwiki, a template used in all biographical articles compares its information with Wikidata statement. When these are missing, the articles get added to it:Categoria:Dati incrociati con Wikidata - Template Bio. From there, some tools import(ed) the information to Wikidata. Similar categories exist in other wikis/fields can or are regularly imported. Still, much can probably be done. @Myst: --- Jura 11:57, 13 April 2021 (UTC)
  • If you want just to test your code, you'd better write mock-based unit tests. Personally, I did not encounter Wikidata items which are edited exactly once a day. --Lockal (talk) 11:16, 12 April 2021 (UTC)
    • I already have some unit tests, but I can't have all the cases with them. Myst (talk) 12:46, 12 April 2021 (UTC)
Every tennis player has three properties that could be updated weekly from the source: the ATP or WTA-websites. This is prize money (P2121), singles record (P564) and doubles record (P555) and maybe ranking (P1352) as well. Edoderoo (talk) 11:48, 13 April 2021 (UTC)

Capitalized labels and descriptions[edit]

How do I get a list of all labels and descriptions that start with a capital letter in Bosnian? – Srđan (talk) 22:14, 12 April 2021 (UTC)

Hello @Srđan. Here is an example query:
select * where {
  ?item rdfs:label ?label .
  filter (lang(?label) = "bs" && substr(?label, 1, 1) = ucase(substr(?label, 1, 1)))
} limit 10
Try it!
Best wishes. Toni 001 (talk) 07:26, 14 April 2021 (UTC)
@Toni 001: Thanks! What about descriptions that are capitalized? How do I get a list of those? – Srđan (talk) 08:05, 14 April 2021 (UTC)
You would replace rdfs:label with schema:description. Unfortunately that query times out. A solution might be to focus on an area of interest by adding additional restrictions. In this example, we query for descriptions of people:
select * where {
  ?item wdt:P31 wd:Q5 .
  ?item schema:description ?desc .
  filter (lang(?desc) = "bs" && substr(?desc, 1, 1) = ucase(substr(?desc, 1, 1)))
} limit 10
Try it!
Toni 001 (talk) 08:12, 14 April 2021 (UTC)
@Srđan, Toni 001: SPARQL is not useful here; it is pretty inefficient for bulk string operations in general, and the 1-minute timeout limit makes it practically impossible to retrieve a reasonable amount of results. I would go with an SQL query as in quarry:query/40118 for descriptions only, where all ~419.000 results have been queried in 10–15 minutes. Relevant documentation is here. The query can be forked and modified by other users. —MisterSynergy (talk) 09:00, 14 April 2021 (UTC)
Wow, that is a lot more efficient. Thanks! – Srđan (talk) 10:34, 14 April 2021 (UTC)
Thanks. Great to know that there is SQL access to some data. Toni 001 (talk) 06:54, 16 April 2021 (UTC)

P1876 says Q188924 is not a subclass of vehicle, but it is...[edit]

In Q201575 (D. João I), P1876 (vehicle) is throwing an error with Q188924 (galley), saying it is not a subclass of Q42889 (vehicle), but it is. Any idea of what is wrong there?-- Darwin Ahoy! 01:06, 13 April 2021 (UTC)

  • vehicle (P1876) requires an item about an instance, not a class. --- Jura 08:58, 13 April 2021 (UTC)
@Jura1: What should be used there to state that the journey was made by galley, then?-- Darwin Ahoy! 13:57, 13 April 2021 (UTC)
The specific ship used, I believe. Like in first voyage of Christopher Columbus (Q3771259). If there is no item for it, not sure then. AntisocialRyan (talk) 14:07, 13 April 2021 (UTC)
The same problem happens on other items, like Aviogenex Flight 130 (Q16983329). Are they wrong, or should the constraint be relaxed to allow classes too? Ghouston (talk) 00:31, 14 April 2021 (UTC)
@Jura1:I believe it can be solved by creating a new property "means of transportation".--MathTexLearner (talk) 11:34, 14 April 2021 (UTC)
I agree, this property/qualifier seems to be missing.-- Darwin Ahoy! 23:26, 16 April 2021 (UTC)
There would be a lot of overlap between two properties like that. Which one should be used on Aviogenex Flight 130 (Q16983329)? Ghouston (talk) 02:40, 17 April 2021 (UTC)

Wikidata workshop at ISWC, October 2021, online[edit]

Dear colleagues,

The Wikidata workshop at ISWC goes into the second edition! Please find more information here: https://wikidataworkshop.github.io/2021/

Cheers, Ls1g (talk) 10:02, 13 April 2021 (UTC)

Is Kim Kardashian a politician?[edit]

Kim Kardashian (Q186304) is (P106) a socialite (Q512314). So far, all good.

But is a socialite really a subclass of an aristocrat (Q2478141)?

If it is, then Kim Kardashian (and all other socialites) would not only be an aristocrat, but also a ruler (Q1097498) and a politician (Q82955).

Does that make sense?

(was not logged in before - now I am Cheeeeesus (talk) 12:09, 13 April 2021 (UTC))


You could ask the user who added it: [1]. --- Jura 12:11, 13 April 2021 (UTC)
This change has been online for almost two years. So I assume there is a consensus that it is okay? Cheeeeesus (talk) 12:14, 13 April 2021 (UTC)
Okay, I asked the user. Not sure if they answer, their last answer is from 2019. If not, I'm going to reverse their edit, hope that's okay. Cheeeeesus (talk) 12:35, 13 April 2021 (UTC)
Wikidata is by no means guaranteed to make sense, ever. -Animalparty (talk) 02:50, 14 April 2021 (UTC)
I don't expect a guarantee. But, do you guys have some kind of QA system, like marking some nodes or triples as final, so they cannot be edited anymore without special privileges? I'm thinking of the core ontology of Wikidata, those basic facts that are not really disputed. How do you deal with someone removing the triple Q2 P1419 Q185969 and replacing it by Q2 P1419 Q238231? Is such a change detected and reversed automatically, or could it be online for months and even land in a monthly data dump? Cheeeeesus (talk) 07:12, 14 April 2021 (UTC)
Wikidata QA system works through watchlists. Central items that many people care about are often on the watchlist of multiple people and if any of the people who have the item on their watchlist and who check their watchlist disagree with an edit they revert a change. That system works well in areas where there are enough people who care about the items and have the item on their watchlists and less well in areas where that isn't the case.
The amount of people who care about the core ontology of Wikidata unfortuantely isn't that big, so a lot of it's organization is far from final. ChristianKl❫ 13:27, 14 April 2021 (UTC)
I see, thanks. Wouldn't it be great if some triples could be marked as final? Of course this would need some kind of process, e.g. a "Request for finalization" which can be granted by admins. Everything that is either a) true by definition (a woman is a human being), b) an undisputed scientific fact (the earth's shape is a geoid), or c) has been settled in an extensive discussion (of which I'm sure there are many examples), every triple that is in one of these categories is marked as final. No edit-reverse games anymore, and fewer dubious triples. Cheeeeesus (talk) 13:37, 14 April 2021 (UTC)

Items update request[edit]

Please add:

Thanks in advance!!!

--2001:B07:6442:8903:5825:5ACA:9E09:78EB 16:15, 13 April 2021 (UTC)

Did you see Wikidata:Project_chat/Archive/2021/03#Item_edit_request ? --- Jura 16:37, 13 April 2021 (UTC)
@Jura1: I have added in Wikidata:Status_updates/Next#Other_Noteworthy_Stuff, but Mahir256 reverted my edit. Why? --2001:B07:6442:8903:18FD:91E9:BEDA:F82A 07:34, 14 April 2021 (UTC)
It was about the announcement of the Wikipedia edition. That was actually done, see Wikidata:Project_chat/Archive/2021/03#Wikidata_weekly_summary_#461.
Also, did you consider creating a user account to enable you to edit (semi-protected) Wikidata yourself? --- Jura 20:00, 14 April 2021 (UTC)

Everything's added. I'm gonna go through the tayWP to find smth else that's unconnected. --Wolverène (talk) 07:54, 14 April 2021 (UTC)

P.S. tay:kin iniptnaq:無連接頁面, if anyone is interested. --Wolverène (talk) 07:59, 14 April 2021 (UTC)

@Wolverène: you missed Cuba :-) --2001:B07:6442:8903:18FD:91E9:BEDA:F82A 08:24, 14 April 2021 (UTC)

Jura1 did it.:) --Wolverène (talk) 08:26, 14 April 2021 (UTC)
oh I hadn't seen, sorry... :-) --2001:B07:6442:8903:18FD:91E9:BEDA:F82A 08:30, 14 April 2021 (UTC)

Cho/CHO[edit]

Q450350 and Q5011249 need to be merged together. They are the Q-pages for the disambiguation page "Cho"/"CHO", where only one of the pair exists on each language Wikipedia. -- 67.70.27.246 06:41, 14 April 2021 (UTC)

That's not true given that EnWiki has pages for both. In general we also don't merge different disambiguation pages together. ChristianKl❫ 10:16, 14 April 2021 (UTC)
@ChristianKl: But you're not right either. In the English Wikipedia, there are no two disambiguation pages for Cho / CHO, but there is only one for Cho. The link saved in the data object CHO (Q5011249) is only a redirect. Nevertheless, I am also against merging the two data objects. --Gymnicus (talk) 13:43, 14 April 2021 (UTC)
Wikidata:WikiProject Disambiguation pages/guidelines doesn't discuss it: what should happen in that case with disambiguation pages that have mixed capitalization, such as en:Cat (disambiguation): it's very common, and maybe there are situations where different languages have done it differently. Ghouston (talk) 23:09, 14 April 2021 (UTC)
All the various Wikipedias listed in these two Q-objects use mixed cases. If this were still in the old days with interwiki links placed on the pages themselves, then all these pages would be interlinked. However, now with Wikidata, this is no longer the case, seemingly making the case that one cannot access various Wikipedias properly in the cases of combined case disambiguation pages. -- 67.70.27.246 01:29, 15 April 2021 (UTC)
I wouldn't have any objection to merging the "Cho"/"CHO" items. The only difficulty is when a Wikipedia has 2 pages for disambiguations that differ only in case. I think the best procedure in that case would be to add as many sitelinks as possible to a "main" Wikidata item, and use additional items as required for the Wikipedias with multiple pages. Ghouston (talk) 02:10, 15 April 2021 (UTC)

Dealing with spelling errors in taxonomic scientific names[edit]

Can someone please clarify before I do something wrong and have the edit police on my back exactly how errors in spelling of taxonomic scientific names are to be handled. For example Q2920014 is improperly spelled. The epithet should be 'erawan' as seen here https://wsc.nmbe.ch/species/11516

The 'erewan' spelling error has been commonly used, and for example is used on the various Wikipedia pages linked and has propagated around but is wrong. Is it acceptable to simply edit the name and label, or does a new entry with the right spelling need to be done ? CanadianCodhead (talk) 14:24, 14 April 2021 (UTC)

To avoid a revert war running down through the ages, possibly best to deprecate the incorrect value statement with a reason for deprecation (P2241) qualifier specifying the error - e.g. misspelling (Q1984758) - and create a new statement for the correct spelling. --Tagishsimon (talk) 15:29, 14 April 2021 (UTC)
The best way to avoid an edit war would be to make sure to add a reference where the misspelling is specifically indicated/discussed/corrected as such (as the ICZN includes provisions that may cause a misspelling to become the correct spelling). Circeus (talk) 17:39, 14 April 2021 (UTC)

Request to link several(1000+) pages[edit]

I run CrowleyBot on zhwikt, and imported some pages from enwikt. The page list is here. EdwardAlexanderCrowley (talk) 01:37, 15 April 2021 (UTC)

@EdwardAlexanderCrowley: I'm not quite sure what you're requesting, and the page list you link doesn't make sense to me. However, please note that regular Wiktionary pages are not linked to Wikidata as sitelinks - see Wikidata:Wiktionary/Sitelinks for more details. ArthurPSmith (talk) 17:37, 15 April 2021 (UTC)
That's module pages. All of them both appear on enwikt and zhwikt.
Also, Module:languages/data3/* are already linked on wikidata, so all Module subpages should be linked, too. EdwardAlexanderCrowley (talk) 03:18, 16 April 2021 (UTC)
You may first use PetScan to create items for English modules, then use QuickStatements to add Chinese sitelinks.--GZWDer (talk) 21:48, 16 April 2021 (UTC)

Universal Code of Conduct consultation: the summary of Wikidata consultation is online[edit]

Hello everyone, just a quick note to inform you that the summary of the Wikidata consultation about the Universal Code of Conduct is online on Meta at m:Universal Code of Conduct/2021 consultations/Enforcement/Wikidata community.

If you have any comment, question, clarification or anything else, please let me know here or on the summary's talk page on Meta. Also, if you want to help me translate the summaries, that would be very much appreciated!

Thanks again for your help and support! Sannita (WMF) (talk) 13:47, 15 April 2021 (UTC)

Strange statement[edit]

Wikidata (Q2013) have "Rogerio da silva santana" in instance of (P31) statement. Why? --2001:B07:6442:8903:9DA4:471F:AF5A:CA34 14:24, 15 April 2021 (UTC)

Label vandalism. https://www.wikidata.org/w/index.php?title=Q33120876&action=historysubmit&type=revision&diff=1399508900&oldid=1366679073 --Tagishsimon (talk) 17:43, 15 April 2021 (UTC)
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. Matěj Suchánek (talk) 09:55, 18 April 2021 (UTC)

Important negative statements about Wikidata entities[edit]

We have published the Wikinegata platform for browsing interesting negations about Wikidata entities. [Video overview].

  • Entity Summarization: The main function allows users to search for interesting negations about entities of their choice. In the figure, we learn that Einstein hasn't formally supervised any PhD students.
Entity Summarization
  • Question answering: Our platform offers a question answering function. One can search for entities using negative statements, where the entity is a variable. In the figure, we learn about people who have no academic degree.
Question Answering.

Why Wikinegata?

Like most major KBs, Wikidata is incomplete and therefore operates under the open world assumption (OWA) - statements not contained in Wikidata should be assumed to have an unknown truth. The OWA ignores however, that a significant part of interesting knowledge is negative, which cannot be readily expressed in this data model.

The platform is built upon the peer-based inference methodology: Given an entity 𝑒, compile a ranked list of interesting grounded negative and universally negative statements.

Grounded negative statement ¬(subject, predicate, object)

¬(Stephen Hawking, award, Nobel Prize in Physics) "Stephen Hawking has not won the Nobel Prize in Physics."

Universally negative statement (subject, predicate, ∅)

(Leonardo Dicaprio, spouse, ∅) "Leonardo Dicaprio has never been married."

Read more about Negative Knowledge in Open-world Wikidata at: https://wikiworkshop.org/2021/papers/Wiki_Workshop_2021_paper_3.pdf

Note: Since it is indeed a challenging problem, especially when it comes to absence due to KB incompletion vs. absence because of actual negation, we discuss is Section 5 of the paper the lessons we learned while developing the system as well as using it to find modelling issues in Wikidata.

but we can represent negative statements. we can add them as deprecated, no? BrokenSegue (talk) 14:46, 16 April 2021 (UTC)
  • Interesting in principle, but maybe the samples can be improved.
Most popes and a few dead people have spouse (P26)=no value to express that they never married (except popes who did marry, obviously). We try to avoid adding that for living people though.
Template:Positive and negative lists a few related properties. --- Jura 14:56, 16 April 2021 (UTC)
@Jura1: Why are you saying that "We try to avoid adding that for living people though"? Could you tell me when and where it has been discussed or decided please? Ayack (talk) 17:47, 16 April 2021 (UTC)
I presume Jura is saying that because it is the case. Absence of discussion is not absence of observable consensus. --Tagishsimon (talk) 17:59, 16 April 2021 (UTC)
Our datamodel suggests that QX PY QZ means that there's a point in time QX PY QZ is true. Correspondingly, QX PY no value means that there's no point in time where a the claim is true. For living people that means you would need to be able to look into the future to confidently assert QX spouse (P26) no value without qualifiers. A point in time (P585) qualifier would be needed to state that a particular time a person had no spouse. ChristianKl❫ 21:42, 16 April 2021 (UTC)
It would mean we should assert anyone with
  • Just noticed that some people had been deleting statements on popes and we hadn't actually completed (all of) them. Anyways, I restored a few. The avoidance of such statements on living people is more or less the outcome of the data format. Besides, there were some efforts to fill the information on deceased people only. --- Jura 18:34, 16 April 2021 (UTC)
  • "The OWA ignores however, that a significant part of interesting knowledge is negative, which cannot be readily expressed in this data model" This seems like a strange sentiment. The knowledge can be expressed in our data model via features such as no value. Additionally, Wikidata is defacto incomplete and if you naively assume it to be complete you are making a lot of false claims. You could make an argument that deduces that for people like Albert Einstein we can assume all Phd students that exist to be in Wikidata and that all Nobel Prizes are in Wikidata but from what you tell us about your tool, it doesn't do any work about analysing how likely Wikidata is complete and thus a missing value is an indication of something not being the case. ChristianKl❫ 21:42, 16 April 2021 (UTC)
  • @Sentiment: You are right about the no-values, my colleague slightly overgeneralized, apologies for that. For negating single statements I haven't seen a convincing solution yet though, Help:Deprecation as per its definitition, seems not intended for this case.
@Sophistication of the inference process: The explanation above isn't complete - the tool does not naively assume that the whole of Wikidata is complete, but only regions which are sufficiently populated among similar entities. So if a lot of entities similar to Hawking have won the Nobel Prize, it assumes that Nobel-prize-winning is notable enough to be complete also for Hawking. In contrast, few of the entities similar to Hawking have philately asserted as occupation, so we do not assume that this information on him is complete, hence would not negate it.
Ls1g (talk) 14:31, 18 April 2021 (UTC)

Twitch category ID (P4467)[edit]

Hi, there is a restriction on this property that it has to have a Internet Game Database game ID (P5794). But the categories for Art and Travel & Outdoors list many different things, and don't have game IDs. How can this restriction be modified? See art and travel for the errors. Thanks, --Funandtrvl (talk) 00:24, 17 April 2021 (UTC)

you can mark certain items as exempt from the restrictions. I added art (Q735) as an exception as an example. BrokenSegue (talk) 03:03, 17 April 2021 (UTC)

On the term structured data[edit]

Hello, Yesterday on a Wikidata Telegram group I shared my disappointment with the term "stuctured data". Since it did find some positive echoes there, I thought it might be interesting to have a wider opinion on the matter and this seemed to be a good place to start this opinion gathering.

So first, let's expose what is the criticism addressed to "structured data". Shortly "structured" is far too broad and fuzzy to give fine suggestion of what it is referring to. Standards like ASCII and Unicode structure digital data into encoded text. Most texts out there are structured according to some language. Texts published through a basic Mediawiki instance are structured with Wikicode and HTML. Articles on most Wikimedia projects are structured according to more or less explicit editorial line. On some Wiki, like Wiktionnairies, the Wikicode syntax conventionally structure a typology of lexicographical descriptions – though no technical mechanism enforce these policies. I think that it's enough to make the point of how ambiguous "structured data" is.

So secondly, what do we actually mean with structured data in the context of Wikidata? To keep it short, I would propose this definition: "explicitly encoded semantic relations".

Thirdly, here is a proposition of alternative term: cohesive data.

Finally, you are invited to respond to the pool What term do you find more appropriate for referring to data interconnected within a Wikibase instance?

Note that choices are voluntarily limited following an approach where "less choice is better". But alternative suggestions as well as any other form of gentle feedback is welcome.

Cheers, Psychoslave (talk) 06:31, 17 April 2021 (UTC)

You say <<Shortly "structured" is far too broad and fuzzy to give fine suggestion of what it is referring to>> without troubling to specify what you suppose structured data to refer to. That renders your thesis impenetrable. --Tagishsimon (talk) 11:58, 17 April 2021 (UTC)
Hello @Tagishsimon:. Here is proposed definition "explicitly encoded semantic relations". Does it let the thesis still impenetrable? Note that this was the point of the "secondly" paragraph. Face-smile.svg Psychoslave (talk) 14:54, 17 April 2021 (UTC)
what is the problem you are trying to solve? structured data has a meaning that does not apply to ASCII text and while marked up text has some structure (especially the infobox templates) few would call it structured. BrokenSegue (talk) 13:21, 17 April 2021 (UTC)
One problem is the ambiguity of the term. So one aim is to clear up misunderstandings. The first link I find when looking for "structured data" is SEC.gov | What Is Structured Data?. In particular it says "The granularity of these pieces can range from an individual data point, such as a number (e.g., revenues), date (e.g., the date of a transaction), or text (e.g., a name), to data that includes multiple individual data points (e.g., an entire section of narrative disclosure).". The granularity specified in this particular definition let very large place to interpretation, and clearly encompasses something like requiring an entire section from a Mediwiki article through an API call. If this is what is expected from a structured data source, then a Wikipedia article is clearly a structured data source. And a Wiktionnary edition that have conventions to use wikicode syntax for structuring all lexicographical articles is a structured data source with very fine granularity. So, the term lose all its purpose, since it fails to differentiate data sources powered by a bare Mediawiki instance and an other one using Wikibase to explicitly encode data interrelations.
A second problem is that it promote the false assumption that one should qualify "unstructured data" what doesn't fall under the definition of "structured data", whatever the definition given to this term. Searching for "structured data" will respond with many example of articles falling in this fallacy. So a second aim is to dispel the vilification of other form of data sources, although it is not assumed here that this kind of "incidental obloquy" is anything like a evil plot. That just happen to be a factual consequence.
Note that these are problem from my perspective. So an on an other level the answer to your question "what is the problem you are trying to solve?", is that I'm trying to figure out how widely this perspective is shared or not, thus the pool.
Cheer, Psychoslave (talk) 15:43, 17 April 2021 (UTC)
@Psychoslave: structured to unstructured data is a spectrum. People even use the term "semi-structured" for data that falls between. I don't think you will find people that would call wikipedia/wikitionary "structured data". It might be semi-structured in some places. Wikidata aspires to fully structure the data. I'm unaware of people vilifying unstructured data. It's a neutral term. I see no problem. Inventing a new term doesn't reduce ambiguity. BrokenSegue (talk) 16:14, 17 April 2021 (UTC)
Thanks for the feedback, it will feed my reflections on the matter. Psychoslave (talk) 17:14, 17 April 2021 (UTC)
en:Structured data and en:Data structure are pretty standard terms in a computing context, I don't see the ambiguity you claim. The term is contrasted with en:Unstructured data - generally large blocks of text. I've never heard the term "cohesive data" - do you have a reference on that? ArthurPSmith (talk) 13:21, 19 April 2021 (UTC)

are both Marie Bracquemond (Q56865078) and Marie Bracquemond (Q273552)[edit]

Marie Bracquemond (Q56865078) contains about journal articles of Marie Bracquemond (Q273552). is this approach correct? if not, what is the right way to update journal articles on Marie Bracquemond (Q273552). Gi vi an (talk) 08:04, 17 April 2021 (UTC)

New essay: of (P642) considered harmful[edit]

User:Lucas Werkmeister/P642 considered harmful is an essay I wrote, advising against the use of the qualifier of (P642). I invite you to read it and leave feedback on the talk page.

(Disclaimer: this is totally unrelated to my work at WMDE, which is why it’s posted under my private account.) Lucas Werkmeister (talk) 15:38, 17 April 2021 (UTC)

I basically agree that it is impossible for of (P642) to be machine interpretable. It is so very context sensitive. I would like to see a proposal to capture much of the value of of (P642) at the same time as removing it though. Or at the very least guidance on how to do the same thing without it. BrokenSegue (talk) 16:18, 17 April 2021 (UTC)
  • I tend to agree with the analysis, but come to the opposite conclusion: I find it mostly harmless. Obviously, if there is more specific property to do the same, that should be used. --- Jura 17:33, 17 April 2021 (UTC)
Thanks for taking the time to describe what is wrong with this property. Whenever I saw this property being used I somehow felt reminded of some of last century's science fiction trying to imagine how artificial intelligence might sound and reason like; or like the (good faith) intent to store as much information as possible in an item - things that might be better represented by dedicated properties or in a Wikipedia article. Toni 001 (talk) 04:59, 18 April 2021 (UTC)
@Toni 001:, this may sound strange, but current usage is fully compatible even with modern AI development. The biggest working example is a Google Knowledge graph (there are few publicly available patents for those who are interested). For smart enough encoder there is no difference between sentence "seiyu is the word for voice actors in anime and Japanese films"/"Сэйю — японские актёры озвучивания"/etc., statement seiyū (Q622807) subclass of (P279) voice actor (Q2405480) / of (P642) Japan (Q17) (as currently stated in seiyū (Q622807)), or any other ways to represent connection to Japan. "of" here is just an object in a latent space of human representation, which various people represent slightly different. So when "seiyū" occupation is added to any human, even without citizenship or birthplace statement it increases the Bayesian probability of person being related to Japan. Such implications also play an important role during automatic graph reconciliation (which may even form an avalanche effect, e. g. there were hundred of duplicate songs by dupe person -> person was marked as seiyū -> algo deduced that person is the same -> branches were merged). Therefore, even though "P642 considered harmful", there is some justification for using it until a better qualifier proposed. --Lockal (talk) 09:55, 19 April 2021 (UTC)
I like the essay. --Matěj Suchánek (talk) 10:02, 18 April 2021 (UTC)
We previously had a similar vague qualifier "as". We finally get rid of it (object has role (P3831) is created in this process). Simply we need to propose some other qualifier as replacement. BTW: Usage of P31 as qualifier is long deprecated but still common.--GZWDer (talk) 14:14, 18 April 2021 (UTC)
Very good write-up, but still lacks replacement suggestions for many P279 examples there. It raises the same issue as with "P31 as qualifier". It may sound like an easy thing to fix, but when you dip into (currently) 184k constraint violations, it gets tough. --Lockal (talk) 16:41, 18 April 2021 (UTC)

Non-existed site link[edit]

In Q4058121, the English Wikipedia article has been deleted in 2019. How is this possible that the Wikidata item still displays the sitelink?--Ymblanter (talk) 21:28, 17 April 2021 (UTC)

@Ymblanter: Probably just a glitch. My understanding is that removal of the sitelink from Wikidata happens as part of deleting a page on a client project. I don't think there's any process that follows up to check that has happened correctly.
On a related note, I see the WPEN page was deleted primarily because its creator and primary contributor was blocked, and I see no direct discussion of the page's notability. This makes it a good candidate for undeletion if you wanted to pursue that. Bovlb (talk) 22:02, 17 April 2021 (UTC)
I asked an administrator to take a look at en:User talk:Ruslik0. Ghouston (talk) 00:31, 18 April 2021 (UTC)
@Bovlb: I didn't notice that you are an administrator there too, I guess you can already see the deleted article to answer my question. Ghouston (talk) 00:38, 18 April 2021 (UTC)
Thanks both of you. Yes, the subject is notable, though I did not have a look at the deleted article itself, whether there is any salvageavle material here. But my primary worry here was that some legitimate process resulted in this. I agree that it is likely a glitch.--Ymblanter (talk) 06:46, 18 April 2021 (UTC)
@Ghouston: Sounds like you already got the result you needed. Let me know if I can be of further assistance. Bovlb (talk) 04:58, 19 April 2021 (UTC)
  • I think there is a phab ticket related to it somewhere. Maybe it covers the above usecase (mass deletion), maybe not. You might want to bring it up at Wikidata:Contact_the_development_team. --- Jura 12:37, 18 April 2021 (UTC)
    Thanks, will do now.--Ymblanter (talk) 18:41, 18 April 2021 (UTC)

Keith Hartley[edit]

In Q7001559, the identifiers refer to both a basketball player and an economist. However, I find no evidence saying they are the same person. 佛祖西来 (talk) 16:48, 18 April 2021 (UTC)

That's a common conflation coming from VIAF, imported by KrBot in 2015. Such items should be split on sight, so I'll split it. As originally this item was created for a basketball player, a new item for an economist should be created. --Lockal (talk) 08:20, 19 April 2021 (UTC)

Use of title (P1476) with biographical article (Q19389637) items?[edit]

Can I get some authoritative opinion on expectations ...

With a biographical article like Abauzit, Firmin in s:en:A Biographical Dictionary of Modern Rationalists (A Biographical Dictionary of Modern Rationalists (Q106552352)) would/should we be filling the WD item Abauzit, Firmin (Q106552372) with a title (P1476) even though the biographical section does not itself have a title, just some bolding.

What are the pros and cons of having that field filled? Also noting that we are going to have a whooooooooole lot of works at enWS that are not going to have that field filled. Thanks.  — billinghurst sDrewth 06:16, 19 April 2021 (UTC)

A benefit is to preserve the string "Abauit, Firmin" which might not survive as an item alias, and which might conceivably respond to some forms of search; similarly using title (P1476) makes that property useful in SPARQL queries, and arguably improves the graph compared to an item omitting P1476. Con is at best pedantic: "is it really the title". If we then look at the index, there's a sub-question of "Airy, Sir George B." or "Airy, Sir George B., K.C.B., D.C.L., LL.D., F.R.S.". Could add both and use a qualifier to specify article title versus index title. In general I'd advocate more is better. --Tagishsimon (talk) 13:13, 19 April 2021 (UTC)

Creating a Wikipedia/Wikidata replica and keeping it in sync[edit]

Hello,

I am exploring the possibility of having a self-hosted Wikidata + Wikipedia replica hosted within our Intranet where users could query it via SPARQL. The intranet is not connected at all to the internet so the whole stack needs to be local.

How trivial is it to keep the replica in-sync with the "master" wikipedia and wikidata instances? I do not need the replica in synch in real-time. Daily or monthly extracts would be fine. Having to download, transfer and restore full dumps regularly to achieve an up-to-date replica is not feasible, I would need partial updates e.g. Daily or Monthly deltas.

What would be the full stack of servers/services required to host such as replica. Would it include such things as the SPARQL query builder found here: https://query.wikidata.org/ ?

Thank you for your time  – The preceding unsigned comment was added by 185.45.52.143 (talk • contribs).

See mw:Wikidata_Query_Service/User_Manual#Standalone_service.--GZWDer (talk) 13:35, 19 April 2021 (UTC)

ISBN-13 warning[edit]

Hi, I'm not sure why the ISBN-13 reference I added for the date of birth here has warning: https://www.wikidata.org/wiki/Q558287 I got the ISBN-13 from Amazon and Google books, it looks correct.Tehonk (talk) 14:27, 19 April 2021 (UTC)

@Tehonk: Did you click on the warning icon? When I do that I get an explanation: "The value for ISBN-13 (978-9759544010) should match “13 digits formatted in 5 groups separated by "-", where the 1st group must be "978" or "979". When the 1st group is "978", the 2nd group has 1 digit "0" to "5" or "7", or 3 digits starting by "6", or 2 digits starting by "8", or 2 to 5 digits starting by "9". The last 5th group is a single check digit.” " So some dashes that are expected are missing. I believe however there is a bot that will fix these automatically. ArthurPSmith (talk) 17:24, 19 April 2021 (UTC)

Wikidata weekly summary #464[edit]

Workshop: Scholarly citations in Zotero with the power of Wikidata[edit]

Cita presentation workshop horizontal flyer.svg

Hi, all! I'm developing Cita, a Wikidata addon that adds citations metadata support to the open source reference management software Zotero, with a WikiCite grant from the Wikimedia Foundation. On May 31st at 5PM UTC I will be hosting a presentation workshop where I will show how to visualize connections among items in a Zotero library, using information from Wikidata, and how missing citation data can be easily uploaded to Wikidata as well. Please find more information and the pre-registration form here. --Diegodlh (talk) 18:29, 19 April 2021 (UTC)

Inga Sarri[edit]

I have added both death date and URL to the Wikidata fact box regarding recently deceased Swedish actress Inga Sarri, but the death date does not show in the Wikipedia format, only the death year. Can you help me?

https://sv.wikipedia.org/wiki/Inga_Sarri

https://www.wikidata.org/wiki/Q4977086 90.235.21.211 21:23, 19 April 2021 (UTC)

The sv template had the date hard coded. Have removed that and now all is well. --Tagishsimon (talk) 21:28, 19 April 2021 (UTC)

How do I remove a Wikidata item that is incorrect?[edit]

Wikidata says that a category on the Commons: "Shrine of Heer Ranjha, Jhang" is a "popular tragic romances of Punjab" when it is a shrine that is a Cultural heritage monument in Punjab, Pakistan? Apparently there is a written work/films etc. called "Heer Ranjha" but this wikidata item wrongly says this shrine is it. There is no category on the Commons regarding this written work that I know of. How do I get rid of this Wikidata and the linked enwiki article about the "popular tragic romances of Punjab" for the shrine? Thanks, Krok6kola (talk) 22:52, 19 April 2021 (UTC)

Where exactly does wikidata say this? Could you point to the item please. --Tagishsimon (talk) 22:57, 19 April 2021 (UTC)
If you look on the Commons category: "Shrine of Heer Ranjha, Jhang" it says it there. I tried to remove it from Wikidata, but I don't think I succeeded. I am really here My user page on Commons and can not figure out Wikidata. Thank you, Krok6kola (talk) 23:03, 19 April 2021 (UTC)
It is Wikidata Q3631228. Thanks, Krok6kola (talk) 23:06, 19 April 2021 (UTC)