Shortcuts: WD:PC, WD:CHAT, WD:?

Wikidata:Project chat

From Wikidata
Jump to navigation Jump to search

Wikidata project chat
Place used to discuss any and all aspects of Wikidata: the project itself, policy and proposals, individual data items, technical issues, etc.
Please take a look at the frequently asked questions to see if your question has already been answered.
Please use {{Q}} or {{P}}, the first time you mention an item, or property, respectively.
Requests for deletions can be made here. Merging instructions can be found here.
IRC channel: #wikidata connect
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2018/10.

Project
chat

Lexicographical
data

Administrators'
noticeboard

Development
team

Translators'
noticeboard

Request
a query

Requests
for deletions

Requests
for comment

Bot
requests

Requests
for permissions

Property
proposal

Properties
for deletion

Partnerships
and imports

Interwiki
conflicts

Bureaucrats'
noticeboard

Contents

Noobie Questions[edit]

Now I'm curious about this WikiData thing. I found something needing merging, then came back about "The Dog" ( https://www.wikidata.org/wiki/Q28136699 ). Seeing that no one had touched it I looked into it. The information does not overlap. Then I wondering about why I'm unable to add more details to that article.

For example, The Dog has two co-directors and I can't even add one, much less worry about duality.

I find it strange that I can add "terrorism" as one of the main subject qualifiers but I can't add "theft" "robbery" "bank" etc. (I pulled up Iron Man (2008 film) to use as a template/reference for adding details.)

Turns out I can still add a lot in "depicts" and "main subject". Is there a difference? I added much of the same to both.

I stumbled upon a glitch: Adding the film premiere info (URL, title, date, retrieved date, author), all was fine until I tried to add unknown authors, then it would freeze and no longer "publish" - even when I removed the 'author' part. The first time I'd filled out all the things without publish-saving. The second time I'd filled out each item and published one at a time - until I had to refresh Firefox. Similarly, I discovered you have to add the co-directors names and some details before you can even reference them. Empty non-links are not allowed yet people stubs are. Weird.

I also wanted to add the link to my original article https://infogalactic.com/info/The_Dog_(2013_film) (almost identical because I wrote them at the same time - but IG has an image that was removed on WP (who censor everything)).

I have no dog in maintaining "The Dog" or it's unimportant article, but I thought it'd be a good enough place to learn about WikiData.

I kinda see the potential but there's sooooooo far to go - and then there's the faaaaaaar bigger problem of the corporatocracy censorship, distortion of events, evasive truth, and intolerance of anti-establishment ideas and authentic freedom. Any A.I. that may use this is already crippled by all the corrupted systems, including legacy media the limited and limiting so-called source of "facts" allowed on Wikipedia, etc.

Suggestions for good tutorial videos or a short overview would be greatly appreciated. ~ JasonCarswell (talk)

Bad birthdays[edit]

Google's Knowledge Graph has informed me that en:Laura Vaccaro Seeger, Q17147769, was born at 00:00:00 on 1 January 1900, which is obviously an error as the world's oldest person was born in 1903. (User:GreenMeansGo has deleted her birth date from the record here). It turns out that this statement was added by Reinheitsgebot four months ago, drawing on her VIAF information, although her VIAF page doesn't have a birth date. I suspect that the bot misinterpreted the complete lack of data as an indication that she was born at 00:00:00 on 00-01-01.

Could someone query the database to find everyone who was allegedly born on 1900-01-01, and then analyse the results to see how many of these dates were added by this bot? I'm just guessing that if there are many of them, that most of them will be errors. Nyttend backup (talk) 19:29, 3 October 2018 (UTC)

<ns1:birthDate>1950</ns1:birthDate>
<ns1:deathDate>0</ns1:deathDate>
<ns1:dateType>flourished</ns1:dateType>

Date types can be 'lived', 'circa', or 'flourished'. To quote https://github.com/OCLC-Developer-Network/viaf-dates/blob/master/README.md:

'lived' says the dates are birth and death dates and should be accurate += 3 years. 'circa' says the dates are a guess and should be given a 10 year error of margin. 'flourished' are more likely to be dates the person worked or a century date and are given 100 years leeway.

In this case work period (start) (P2031) is a more appropriate property than date of birth (P569) (diff). But even when VIAF states dateType=lived, it can still be incorrect or indicate something else – e.g. https://viaf.org/viaf/72927896/viaf.xml for Ebenezer Hewlett (Q18671012):

<ns1:birthDate>1737</ns1:birthDate>
<ns1:deathDate>1747</ns1:deathDate>
<ns1:dateType>lived</ns1:dateType>

One more example – Mahākāśyapa (Q335304) hadn't died at age 1 (diff) https://viaf.org/viaf/62449721/viaf.xml:

<ns1:birthDate>-550</ns1:birthDate>
<ns1:deathDate>-549</ns1:deathDate>
<ns1:dateType>flourished</ns1:dateType>

Please see this article as well: Parsing and Matching Dates in VIAF. Santer (talk) 14:11, 11 October 2018 (UTC)

Using 'imported from Wikimedia project' for adding a reference calls anything and everything[edit]

When adding a reference using imported from Wikimedia project (P143), it doesn't restrict the suggested list to Wikimedia project (Q14827288), which necessitates extra typing and increases the chance for errors. Can this be fixed? Abductive (talk) 04:34, 6 October 2018 (UTC)

  • You can ask for it at Wikidata:Suggester ranking input. --- Jura 08:11, 6 October 2018 (UTC)
  • You shouldn't attempt to add such a reference at first. Matěj Suchánek (talk) 09:35, 6 October 2018 (UTC)
    • @Matěj Suchánek: Why not? - Jmabel (talk) 16:17, 6 October 2018 (UTC)
      • This property is only intended for use by bots, not humans (hence it's been blacklisted from the default suggestions). Matěj Suchánek (talk) 08:34, 7 October 2018 (UTC)
        • I am not convinced that we should deter people from using this property manually. If people want to mark a claim as being infered from Wikipedia, this property seems appropriate. − Pintoch (talk) 08:58, 7 October 2018 (UTC)
      • Occasionally I add that manually (e.g. "imported from xyzwiki"), especially when there are countless choices and users might want to go dig for additional sources. I agree that it shouldn't be on the default choices suggested to users.--- Jura 11:50, 7 October 2018 (UTC)
  • What User:Matěj Suchánek says above directly contradicts what I was told on this very page barely a month ago, in a discussion that led to this change in imported from Wikimedia project (P143) as a supposed clarification. There were similar changes in several other languages. What Matěj is saying had been my previous understanding; I was kind of "newbied" and told in no uncertain terms that I was wrong. - Jmabel (talk) 17:27, 7 October 2018 (UTC)
  • The discussion to which I am referring. - Jmabel (talk) 17:29, 7 October 2018 (UTC)
    • It's just that the scope of the two properties is changing. The label of p143 was changed not too long ago and uses for projects other than WMF sites are slowly moved to "stated in". --- Jura 18:31, 7 October 2018 (UTC)
      • The change referred to by User:Jura1 (at least in English) appears to be this one. Jc3s5h (talk) 19:08, 7 October 2018 (UTC)
        • @Jc3s5h, Jura1:: Jura's link, as far as I can tell, has to with the original question asked here, not to my remark to which, judging by indentation, Jura appears respond. If I've misunderstood what's happening here, then my sincere apologies, but so far I stand by what I wrote 17:27 and 17:29, 7 October 2018 (UTC). - Jmabel (talk) 23:06, 7 October 2018 (UTC)
    • The point I was making was not to bother with making it easier for users to insert discouraged references. I'm not saying you should never insert such a reference (otherwise it would have already been banned) but that we shouldn't make it easier. Matěj Suchánek (talk) 16:50, 8 October 2018 (UTC)
      • Sounds like a clusterfuck. Of course you should make it easier. Or tell people that they are not required to provide a reference, since references are for people who hate data and think databases should be outlawed. Abductive (talk) 08:35, 9 October 2018 (UTC)
        • It should be easier to insert proper references, not imported from Wikimedia project (P143). Matěj Suchánek (talk) 15:36, 10 October 2018 (UTC)
          • If that's case, then a bot should go around and remove them all. Abductive (talk) 11:33, 11 October 2018 (UTC)
            • That's a bad idea. (But I hope the intention of your comment was satirical/sarcastic). For one thing, it's useful to have a note that a statement came from a Wikipedia in case it turns out to be wrong. Then there's at least a chance that a kindly editor may correct the statement (coordinates, whatever) on the original wiki, as well as on Wikidata. Jheald (talk) 14:55, 11 October 2018 (UTC)
              • Typically, Wikidata coordinates are more inaccurate than the ones on en.Wiki. And as far as I know, I am the only editor on either project actually making any corrections. Abductive (talk) 18:36, 11 October 2018 (UTC)

Round 2[edit]

Since the remarks indented under what I said in no way address what I said, I'm going to repeat it, hoping to get an actual response.

What User:Matěj Suchánek says above directly contradicts what I was told on this very page barely a month ago, in a discussion that led to this change in imported from Wikimedia project (P143) as a supposed clarification. There were similar changes in several other languages. What Matěj is saying had been my previous understanding; I was kind of "newbied" and told in no uncertain terms that I was wrong. - Jmabel (talk) 20:01, 9 October 2018 (UTC)

So which is correct? - Jmabel (talk) 20:02, 9 October 2018 (UTC)

  • Maybe my memory fails me, but I may have read the same thing you wrote two years ago. Does it really matter? I think the recent change is an advantage. I tend to add "I think" to my comments, mainly to make it clear that's primarily a personal view. Someone else would just write "The recent change is an improvement mandated by ..". The user you mention is generally known for thoughtful comments. --- Jura 20:22, 9 October 2018 (UTC)
  • As you are referring to my comment, here some clarification: imported from Wikimedia project (P143) should only be used with Wikimedia projects as values, and stated in (P248) should not (must not) be used with Wikimedia projects as values. It is for both reference qualifiers irrelevant whether they are added via tools/bot, or manually.
    That said, let’s have a closer look at the situation: imported from Wikimedia project (P143)’s main purpose is to track provenance of data that has been batch-imported from a Wikipedia project, which is generally considered an unreliable source in the Wikimedia universe. The import script cannot verify the imported data, thus the imported from Wikimedia project (P143) reference is a marker to use this data carefully unless there is another, serious source available. Wikipedia templates do indeed consider P143-only referenced statements as unreferenced. Now if you manually import from a Wikipedia, you can also use the imported from Wikimedia project (P143) reference to indicate provenance, but it would in many cases be advantageous if you manually verified the imported value against an external source whenever one is available, and provide this directly in the references section rather than the imported from Wikimedia project (P143) “auxiliary reference”.
    For that reason, I also think that it is reasonable not to offer the reference qualifier imported from Wikimedia project (P143) too much in the GUI. —MisterSynergy (talk) 21:22, 9 October 2018 (UTC)
  • I would say that about half the time I've used imported from Wikimedia project (P143) manually it was because it was already there, but due to the limited intelligence of bots the Wikidata statement did not accurately reflect what the article said. The other half has, indeed, been to add basic data that was sitting in the Wikipedia article but not in Wikidata. Are you saying that it would be better for me not draw facts from Wikipedia at all when I have no interest in taking the time to find more solid sources? - Jmabel (talk) 23:44, 9 October 2018 (UTC)
    • On your last sentence: no, I’m not saying this. —MisterSynergy (talk) 06:45, 10 October 2018 (UTC)

Merging list of monuments (P1456) with appears in the heritage monument list (P2817)[edit]

I can not see that these two Properties are making any difference, both of them are Wikidata property related to geography (Q52511956) or a subclass of (P279) of that. At the time being list of monuments (P1456) 7 805 and appears in the heritage monument list (P2817) has 62 689 items. Pmt (talk) 07:16, 6 October 2018 (UTC)

  • I think we could probably delete the later. It was created for a country that used WLM lists that don't follow an administrative structure. --- Jura 08:07, 6 October 2018 (UTC)

is a Wikinews article[edit]

Wikinews article (Q17633526) 2011 Norway attacks (Q79967) is both instance of (P31) terrorist attack (Q5710433) and instance of (P31) Wikinews article (Q17633526) (see also various Description). This is an issue IMHO. Visite fortuitement prolongée (talk) 18:29, 6 October 2018 (UTC); 19:26, 6 October 2018 (UTC)

@Visite fortuitement prolongée: There are two items which match your description; which one are you talking about? Mahir256 (talk) 18:53, 6 October 2018 (UTC)
@Superchilum, Laddo: regarding 2011 Norway attacks (Q79967); @Laddo: regarding 2017 Quebec City mosque shooting (Q28549976) (as "Huhbakker" is no longer around). Mahir256 (talk) 18:57, 6 October 2018 (UTC)
Sorry, I wanted to talk about 2011 Norway attacks (Q79967). Corrected. Thank you for finding 2017 Quebec City mosque shooting (Q28549976). Visite fortuitement prolongée (talk) 19:26, 6 October 2018 (UTC)
@Visite fortuitement prolongée: Can you clarify your point? These are events, and the first articles reporting that event. They should be separated? -- LaddΩ chat ;) 22:18, 6 October 2018 (UTC)
For Q79967 Description, fr Description is "article de Wikinews", de is "Artikel bei Wikinews", an is "articlo de Wikinews", bav is "Artike bei Wikinews", bs is "Wikinews članak" etc. Very different from en "two sequential lone wolf terrorist attacks in Norway on 22 July 2011", then I guess that there is an issue. Visite fortuitement prolongée (talk) 22:46, 6 October 2018 (UTC)
Even if you dont want a separate item, at least I think that Description and instance of (P31) should not be "Wikinews article". Visite fortuitement prolongée (talk) 23:00, 6 October 2018 (UTC)
@Laddo: Yes. See the result of Wikidata:Requests_for_deletions/Archive/2018/07/08#Q17655696. Mahir256 (talk) 23:19, 6 October 2018 (UTC)

So a Wikidata item with ns0 articles on Wikipedia should not contain ns0 articles on Wikinews, but instead Wikinews categories? And all the ns0 articles on Wikinews should remain separated? --Superchilum(talk to me!) 08:12, 7 October 2018 (UTC)

Yes, I think that's the best solution indeed, if there are several Wikinews articles about the same news event. Especially the Dutch and Russian Wikinews versions work with categories in such cases. On Wikipedia, big news events have their own articles which contain all the relevant information, so categories are not needed there in that case. --De Wikischim (talk) 08:53, 11 October 2018 (UTC)
On the other hand, cases like d:Q28549976 are somewhat different. Since no Wikinews categories are involved here, all the articles (both Wikipedia and Wikinews) can just remain on the same Wikidata item. De Wikischim (talk) 08:59, 11 October 2018 (UTC)
When Wikinews is concerned, the following connections are made to other projects:
  • Wikinews article -> Wikinews article (no connection to other projects)
  • Wikinews category -> Wikipedia article / Wikivoyage article / Commons category
Ymnes (talk) 14:38, 11 October 2018 (UTC)
No they can't remain on the same Wikidata item. A mass shooting (Q21480300) is an event, not a text like a Wikinews article (Q17633526). -Ash Crow (talk) 16:27, 11 October 2018 (UTC)
Well, a lot of Wikidata items now containing both Wikipedia and Wikinews pages will have to be split up in that case. However, I wonder if it's really worth investing that much time and energy in. De Wikischim (talk) 19:31, 11 October 2018 (UTC)

Use version type (P548) as property[edit]

Currently, version type (P548) can be used only as a qualifier. What about using it as property for items that represent program versions?--Malore (talk) 23:33, 6 October 2018 (UTC)

For instance? Matěj Suchánek (talk) 08:30, 7 October 2018 (UTC)
Use « instance of » for this kind of usage. author  TomT0m / talk page 07:53, 11 October 2018 (UTC)

Template markup doesn't work in Description field[edit]

I tried to use the {{P|P...}} template within a Description and was surprised to find that it didn't work. Is this a feature or a bug? Abductive (talk) 23:42, 6 October 2018 (UTC)

@Abductive: That's right. The descriptions are intended to be re-usable in all sorts of environments beyond Wikimedia, so template markup is not supported. Jheald (talk) 23:58, 6 October 2018 (UTC)
Even plain references to other entities should be avoided if possible. Matěj Suchánek (talk) 08:31, 7 October 2018 (UTC)
Is there a « (Not so) frequently (but this still pops up from time to time) asked question » page ? author  TomT0m / talk page 08:00, 11 October 2018 (UTC)

Using IMO number for ship owners and managers[edit]

The property IMO ship number (P458) is now used in more than 10,000 items, basically ships registered with a number in the database of International Maritime Organization (Q201054). The question is: Should we extend this property to also include ship owners and managers? I.e. Wilson Ship Management (Q47149956) has the IMO number 1168545 and Norled (Q7051397) has the IMO number 1956053. Is there any good reason that we should not add these IMO numbers? --Cavernia (talk) 18:50, 8 October 2018 (UTC)

I’ve seen some identifier properties used to identify the type of item it’s into (it has this identifier => it’s a ship) but instance of (P31)/subclass of (P279) does this, so I’d say as it’s an identifier of the same database go for it and extend the use of the property to include the full extent of objects kind it identifies. author  TomT0m / talk page 08:04, 11 October 2018 (UTC)

Mistakes[edit]

One of the problems of assuming that bodies like Historic England (formerly English Heritage) have got it right is that we propagate their mistakes here and onwards. This, for example, is not on Slately Road, it's on Slatey Road, and I've had to email a Minor Amendment to them. Will it be picked up here? Their geocoords are also often inaccurate. There's also no point using their titles here since they are not unambiguous, as ours are forced to be. "Lodge" is a comnon term they use, but we use "Lodge of Allerton Manor" on Commons. Could we have bit more due diligence please, rather than the current obsession with quality over quantity? Thanks. Rodhullandemu (talk) 12:35, 9 October 2018 (UTC)

@Rodhullandemu: I too think more meaningful titles here would be useful, but there has been debate about this, e.g. as to eg whether "Church of St Mary", or "St Mary's, Little Snoring" is preferable. I would prefer the latter (close to how Commons names its categories; less likely to be confused; and easier to select correctly from a drop-down list), but there is a significant constituency of opinion for the former. (Note also in passing that the title does not in fact have to be unique on Wikidata (unlike Commons) -- on Wikidata the actual restriction is that an item has to have a unique combination of title + description.)
On the second point, quantity probably comes first; then scripts etc can be used to compare the data from different sources and identify anomalies. A mistake from a body like Historic England should be noted here with reference (including url and date), but ideally marked down to rank=deprecated, with a qualifier like reason for deprecation (P2241) = error in referenced source (Q29998666). This (a) helps when checking what has and hasn't been imported from the database; and (b) can help us if we are drawing from further sources, that in turn may have drawn their data from Historic England. And (c) it gives us an easy way with a query to pull out a catalogue of issues we think we may have found with a particular source, that we can later check to see whether they may have updated. Jheald (talk) 09:04, 10 October 2018 (UTC)
It seem you also forgot to mention the use of Alias for this kind of usage, a string like « St Mary's, Little Snoring » can definitely be put in an alias to help find this item. author  TomT0m / talk page 08:09, 11 October 2018 (UTC)

Suitability of dataset for Wikidata[edit]

Hi. The National Library of Wales is planning on sharing a dataset, extracted from the Welsh Book of Remembrance. It contains data on 16,000 soldiers who died in action in the first world war. At a minimum these entries contain names, date and place of death and their regiment, but many also contain dates of birth, details of the war monument their names are recorded on and other biographical data. We are considering Wikidata as a possible home for this data, which was transcribed and enriched by volunteers, but i wanted to see how the community felt about adding such a large volume of relatively obscure people to Wikidata. Although obviously there is great potential to link people to regiments, and war memorials, and battle sites. I would appreciate any feedback. Thanks! Jason.nlw (talk) 15:20, 9 October 2018 (UTC)

Use that dataset to complete data on existing items. The existence of a person inthat dataset is not sufficient to justify the creation of an item, but each person having already an ittem and being in that dataset can see the corresponding data added in WD. As I see the the data addition, each year items from WD should be compared with the dataset and if a match is found the data are transfered from the dataset to WD. During that annual check, previous added data could be checked to ensure data integrity. Snipre (talk) 16:46, 9 October 2018 (UTC)
A previous discussion at Wikidata:Project_chat/Archive/2018/07#Scholarly_projects_and_notability_policy and deletion request concluded that even an item like Nathaniel Oldham (Q55445723) is notable enough for Wikidata, because the person has been listed in an external database. The Welsh soldiers seem like an improvement on that one. Ghouston (talk) 19:49, 9 October 2018 (UTC)
  • For some of the entries, you might find additional elements that make them worth including. Personally, I'd find it interesting if you could generate counts for first names and surnames in these lists and add at least those. Similar to the ones we have for Rotterdam/the Netherlands, e.g. at Q4925477#P5323. I can add them to items if you post them somewhere on a Wikidata page. --- Jura 06:04, 10 October 2018 (UTC)

Thanks everyone. @Jura1: If we do go down the Wikidata route i will definitely share the data with you for the name counts. It sounds from the test case highlighted by Ghouston that this data would be deemed acceptable - it looks as though we have the CWGC person ID (P1908) for each person too, so each person could be linked to at least 2 external databases - Commonwealth War Graves and The Welsh Book of Remembrance. We would obviously be careful to match to people who already exist in Wikidata however i think if we were going to do this the Library would want to share the complete set, rather than just the richest parts of it. The benefit to the institution comes from being able to offer users full access to the data via Wikidata, (there will also be a data dump on Github i think.). Obviously we can also make much better use of queries and visualizations if the data set is complete. Jason.nlw (talk) 08:22, 10 October 2018 (UTC)

  • I think it's worth discussing. w:WW1 notes some 9 million soldiers who died in WWI. Obviously, they all have some identifier. I don't think that makes them notable as such. --- Jura 08:53, 10 October 2018 (UTC)
Yes definitely needs discussing, and i'm grateful for all the input so far. I understand that many of these people might not be notable in the Wikipedia sense of the word, but does Wikidata really follow the same guidlines? In past discussions ive been told that Wikidata will generally accept all external datasets as long as it is deemed useful - something that links to, and enriches other data. For a lot of people having a single database of all 9 million WWI soldiers as part of a larger dataset, with linked data on regiments, places ect, would be an exciting prospect. Take the sum of all paintings project for example. Is a painting notable in its self for being in a museum? Some are for sure but many are not. Yet it is the collection of all those items together that has real value (for research and discovery), rather than each individual piece...Well that's what i think any way! please discuss. Jason.nlw (talk) 10:27, 10 October 2018 (UTC)
    • @Jason.nlw: Your own wikibase offering its own federated SPARQL query service might also be a possibility. I believe there's been quite a lot of work put in to try to make it easier for GLAMs to set up and run such a service, and various GLAMs that may be exploring this possibility for different projects. It might be a valuable thing for NLW to run as a pilot, to explore the technology. Not sure if there is a user group. Jheald (talk) 09:12, 10 October 2018 (UTC)
Hi Jheald Thanks for this. Yes this is definitely where we want to get to, however our current level of resources means we simply don't have the capacity to set this kind of thing up, which is one of the reasons we have been so keen on using Wikidata - because the infrastructure is there already! and we have repeatedly found that sharing with Wikidata adds value to our data, as we can pull back data like co-oordinates, external identifiers and additional biographical information. Jason.nlw (talk) 10:27, 10 October 2018 (UTC)
@Jason.nlw: Understood. And of course, it's not just the setting up of your own wikibase, it would be the maintenance and the keeping of it updated, etc which requires ongoing resources. And as you say, it's lonely being away from the main community; even if you can match items to Wikidata, and share Wikidata through federation, you don't get the incidental improvements from bystanders, and all the data you pull back you have to do yourself.
But still, the issue of secondary databases, as hub-and-spoke satellites to Wikidata, is one that a number of projects are coming to have to look at, as they start to strain WD:Notability -- for example, the current discussion in WikiCite. So there are others that would be facing this too. And - in the same way that one can rent a wiki from a serviced wiki-farm in the cloud, it may well be that there is a similar availability of services wikibases, with somebody else managing the setup and the servers and the maintenance and the query service, so that NLW as an end-user might just be able to sign a turn-key contract, and then start putting in data. Jheald (talk) 12:23, 10 October 2018 (UTC)
@Jason.nlw: It's also possible you might be able to get a consortium of GLAMs together behind a UK biographical wikibase -- there might be several cases of datasets such as yours containing people not considered to fulfill WD:N, but that it would be useful to have in an ecosystem attached to Wikidata. Jheald (talk) 12:29, 10 October 2018 (UTC)
Thanks Jheald, there are some really interesting ideas here. In wales for example there is now a big consortium of libraries using the same catalogue system, so perhaps having a wikibase as some kind of extension to that could be a way forward, and a way of open up the data of smaller libraries, who may not even have heard of Wikidata! Jason.nlw (talk) 09:02, 11 October 2018 (UTC)
    • Personally, I don't think adding these at scale is a good idea. While it might technically comply with the notability policy, it's really pushing the spirit of it - eg in theory we could describe everyone alive in the UK or US in certain years using public census data to create WD items, but we'd probably agree that was too much.
In the example Ghouston linked to, while the deletion discussion passed, the discussion on project chat was more nuanced and suggested that this is a good case where a separate database would be able to give the information they want. It's worth noting that in that case, those two test items were the only ones created - it didn't proceed further - so I wouldn't take it as unqualified endorsement/precedent for this being a good approach. Andrew Gray (talk) 11:29, 10 October 2018 (UTC)
  • Thanks Andrew, i appreciate your input. I take your point about the cited test case, and i wouldn't want to try and force anything through based on that alone. However what is becoming clear is that we do not have any kind of community consensus on what should be allowed on Wikidata, beyond the WD:N guide, partly because the project is developing so fast, and so its purpose/mission too is ever developing. Technically this data meets points 2 and 3 of WD:N. However the guide leaves a lot open to interpretation, perhaps deliberately, as Wikidata tries to find its purpose. Personally i think Wikidata should support and follow Wikipedia's goal of providing access to the sum of ALL human knowledge, and when it comes to data, shouldn't ALL mean ALL? Jason.nlw (talk) 09:20, 11 October 2018 (UTC)
If something technically complies with the notability policy, but isn't wanted on Wikidata, maybe the notability policy should be changed. I'm personally in favor of data donations to Wikidata, for the reasons stated above that a more robust dataset is more useful for queries and users. If Wikidata ends up adding census data I would not be opposed to that. Rachel Helps (BYU) (talk) 15:58, 10 October 2018 (UTC)
  • In my opinion, this dataset should be added to Wikidata. It has serious references and can even be linked to a second database. The fear of some users of big datasets is mostly unfounded; for example if one looks what is going on related to scholarly article (Q13442814) one sees that Wikidata can deal smoothly with datasets in the order of 10M. --Pasleim (talk) 20:21, 10 October 2018 (UTC)

Wikisource: Notability of subpages of mainspace pages[edit]

Wikidata:Notability 1.5 states „The status of subpages of mainspace pages (for example, individual chapters) is undetermined”. This is from 2014. Is there any progress on this? --Succu (talk) 19:42, 9 October 2018 (UTC)

works of authors[edit]

How comes Wikidata doesn't include works (books) of authors? As far as I remember Freebase did this and had a huge catalogue of author related works. Or did I only not find them? --Rabenkind (talk) 20:56, 9 October 2018 (UTC) Add: why do I ask this: It makes is simpler to have a list of all books published and look for translations of those books. --Rabenkind (talk) 21:01, 9 October 2018 (UTC)

@Rabenkind: It might help if you mentioned which books by which author you're looking for; as long as you're not self-promoting or spamming your or someone else's books on Wikidata and the books are either 1) mentioned in an external database (VIAF, LCCN, GND, etc.) or 2) used as a reference to a claim on Wikidata, then those books and authors are eligible for Wikidata items. Mahir256 (talk) 21:35, 9 October 2018 (UTC)
It's more how to extract all books (in all known languages) of certain authors like i.e. George Orwell. As I found out that 'Animal Farm' is bound to Orwell I was trying to use the Query Service to get that working - without success so far. --Rabenkind (talk) 21:46, 9 October 2018 (UTC)
@Rabenkind: Write query with instance of (P31) = written work (Q47461344) AND author (P50) = George Orwell (Q3335). Snipre (talk) 22:45, 9 October 2018 (UTC)
Something like
SELECT ?h
WHERE 
{
	?h wdt:P31 wd:Q47461344.
        ?h wdt:P50 wd:Q3335
	SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!
Or
SELECT ?h
WHERE 
{
	?h wdt:P31 wd:Q571.
        ?h wdt:P50 wd:Q3335
	SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it! Snipre (talk) 22:56, 9 October 2018 (UTC)

That's what I was looking for - thanks a lot! --Rabenkind (talk) 07:31, 10 October 2018 (UTC)

Succu problems again[edit]

See this edit.

Followed by my explanation why the edit was wrong.

Followed by Succu making the same bad edit again through a revert to restore his bad edit rather than engage in a discussion to find a workable solution. --EncycloPetey (talk) 05:36, 10 October 2018 (UTC)

Sorry, I saw your entry at my talk page after my revert. And I saw this entry only by chance. More later, EncycloPetey. --Succu (talk) 05:44, 10 October 2018 (UTC)
I not only left a note about the matter on your Talk Page, but also explained the issue in the edit comment with a link to the community discussion on the matter. Are you saying that you missed both messages, or that you reverted first out of habit? --EncycloPetey (talk) 11:04, 10 October 2018 (UTC)
  1. Please avoid offensive headings
  2. Please assume good faith (that you reverted first out of habit)
  3. Please provide a direct link to the item your are talking about (Flora Antarctica (Q6435950))
  4. You referred to the ongoing discussion We need a clear model in WD if we want to follow a FRBR structure pushing your POV
  5. looks like Help:Sources/de#Bücher (instance of book (Q571)) and Help:Sources#Books (instance of written work (Q47461344) are out of sync
--Succu (talk) 20:22, 10 October 2018 (UTC)
  1. The heading is not offensive.
  2. I cannot assume good faith when you repeatedly respond to edits by immediately reverting. Further, you claim not to have seen the responses I made to you, but managed to pattern your comment on the revert to match the comment I made but which you claim you did not see. This breaks the assumption of good faith.
  3. I provided links to the diffs on the item.
  4. The discussion is not ongoing. Discussion was concluded in August (two months ago) and has moved on with changes already being enacted based on the consensus. Nor is it "my POV", but the consensus of Wikiproject:Books. I neither started nor coordinated the discussion, and I was far from the only participant as can be seen by the numerous comments in that discussion from multiple commenters.
  5. Yes, the German translation is out of sync, which is why I directed you to the primary discussion.
  6. In future please read comments provided for your education. We are all in this together, and the comments were provided to bring you up to speed on current consensus. Reverting first and reading later is a poor way to do things; in particular, it does not assume good faith. --EncycloPetey (talk) 22:24, 10 October 2018 (UTC)

Federated search with data.nobelprize.org do we have any experience/design patterns for using information like this in an article[edit]

We can now do a federated query to data.nobelprize.org link list - link map and get back

  • Nobel Lecture
  • Biographical
  • Banquet Speech
  • Prize motivationen
  • Documentary
  • Diploma

using SPARQL see tweet. My question is has anyone tested doing a Wikipedia template that extract information from a Federated query and presented it in an Wiki article and do we have any good pattern for that like caching data? - Salgo60 (talk) 09:01, 10 October 2018 (UTC)

@Salgo60: In general, Wikipedia templates can't be filled from a SPARQL query -- not even from our own WDQS service. Instead they draw from the linked Wikibase directly. It is possible to write Javascript gadgets to run a query and then insert something that looks like a template on a page, but it's not built into the template system, and it's slow and resource-expensive -- at best, that kind of thing is a hack that can be used in very limited circumstances (eg perhaps a gadget of use to a very few power users), but not appropriate for general deployment. Jheald (talk) 09:21, 10 October 2018 (UTC)
Thanks for the clarification then my question is shouldnt this be the direction Wikipedia/Wikidata should move or should everything be stored in WD? I guess a good pattern is to have some kind of caching to get better performance - Salgo60 (talk) 09:27, 10 October 2018 (UTC)
Technical questions aside, is the data from that website even available under a free license? Because their Terms of Use say otherwise. Without a compatible license, it simply can't be used on Wikipedia, whether we want to or not. --Kam Solusar (talk) 20:59, 10 October 2018 (UTC)
Good point my initial thought was a "Nobel prize Infobox" with links to Nobel Data ==>
  • Nobel Lecture
  • Biographical
  • Banquet Speech
  • Prize motivationen
As said earlier Nobel Data is not a problem per se the challenge I see is WikiDataSync how to keep Wikidata updated. I guess we need to gather good examples and at least in Sweden learn data owners what is expected from them when they manage Linked data. I miss good examples of Open government data with good change management working together with Wikidata data - Salgo60 (talk) 05:04, 11 October 2018 (UTC)

Wikidata Synch - Notification[edit]

Another thing is that with queries like this we could easy find when a new nobel prize winner is added (see tweet) and it would be nice to have a subscription service that notify me when something changed - Salgo60 (talk) 09:01, 10 October 2018 (UTC)

Noticing that a new nobel prize has been awarded does not seem like a problem that needs a database solution. Plumbum208 (talk) 09:53, 10 October 2018 (UTC)
The question is more to find good patterns and as Nobeldata has SPARQL and API and a rather small dataset with very high visibility that everyone understands I guess its a good candidate to create good "patterns"
We have in Sweden the following "cleaning" projects that will be a nightmare to keep "in shape"
to be able to manage them I guess best is if we could tell the owner of information that WIkidata prefer using this pattern to keep Wikidata in synch with the data they have. My believe is that SPARQL is a very good pattern for easy maintain the quality data with external sources - Salgo60 (talk) 11:37, 10 October 2018 (UTC)
Create a report (see for example Wikidata:WikiProject sum of all paintings/Wiki monitor/nlwiki) and add it to your watchlist. Multichill (talk) 19:52, 10 October 2018 (UTC)
Yes Listeria is the best pattern I have seen so far to synchronize wikidata with external datasets eg. Property_talk:P1006/Mismatches - Salgo60 (talk) 05:04, 11 October 2018 (UTC)
Thanks I Did a list d:User:Salgo60/ListeriaNobelData3 that I feel will do the job - Salgo60 (talk) 11:21, 11 October 2018 (UTC)

Looks like Metadata2020 and the conf FORCE2018 feels like speaking about the challenges with metadata and machine-actionable data management - Salgo60 (talk) 06:36, 11 October 2018 (UTC)

Reminder about birth-death dates[edit]

Hi all, I spend time now and then filling in death dates for women, but of course this problem is true for all people, female, fictional and otherwise. Please try to be more exact in filling in dates. Someone who was obviously born after 1950 should at least have a birthdate with precision of a decade and not century (which reverts to 1901s). See this link to help work on the century of your choice: Wikidata:WikiProject Women/Centenarians. Thx Jane023 (talk) 14:04, 10 October 2018 (UTC)

If the year of birth isn't publicly available, the decade probably isn't either. I don't think guessing is a good idea. Ghouston (talk) 22:13, 10 October 2018 (UTC)
Guessing may be a bad idea, but unfortunately our sources have already done that and the result is we are left with lots of items with very strange birthdates - some are even set to the year 0 or 100. The 1901s is just an example. Jane023 (talk) 17:13, 14 October 2018 (UTC)

Merge two items[edit]

Could anyone please merge Q17003641 and Q20113514? --193.157.194.219 07:32, 11 October 2018 (UTC)

Could anyone please merge Q11980054 and Q24511179? --193.157.194.219 10:06, 11 October 2018 (UTC)
The geo coordinates for these two items point to two different fjords. [1] and [2]. I am not sure they should be merged. — Finn Årup Nielsen (fnielsen) (talk) 17:08, 11 October 2018 (UTC)
✓ Done for the first two, the second two are still problems. --Liuxinyu970226 (talk) 11:10, 12 October 2018 (UTC)

How to indicate the person who destroyed a work?[edit]

I encountered Un bon bock (Q2000314), a kind of early film that utilises the procedure of Théâtre Optique (Q2709131). It was, along with most of his other ribbons for the theatre optique, destroyed by its creator Charles-Émile Reynaud (Q286445), who smashed it and threw it into the Seine (some sources describe it as an act out of mental derangement or depression). How to express this?

I thought of using significant event (P793)/destruction (Q17781833) with qualifiers. To indicate the person I found three already used ways, none of them very convincing:

I like participant (P710) best, but does this exclude other ways of participating in this event (like trying to save it)? cause of destruction (P770) and killed by (P157) are a bit weird due to the domain/value (and I would say the reason for destruction of Un bon bock (Q2000314) is something like smashing, maybe even mental depression (Q4340209)). Is there a better way for modelling it?

To indicate that he threw it into the Seine I would just use location (P276) (although this is not very accurate, as he threw the ribbons into the Seine after he destroyed them, but this might be negligible).

Btw: That Un bon bock (Q2000314) was intended for Théâtre Optique (Q2709131) I expressed via Un bon bock (Q2000314)uses (P2283):Théâtre Optique (Q2709131). I'm not sure if this is the best way to put it. - Valentina.Anitnelav (talk) 08:45, 11 October 2018 (UTC)

Maybe one could also use significant person (P3342) qualified (object has role (P3831)) with some newly created item like <destroyer>. -Valentina.Anitnelav (talk) 09:53, 11 October 2018 (UTC)
There's also has cause (P828) as a qualifier of dissolved, abolished or demolished (P576), which I don't like so much. The area of destructions in general is quite inconsistent... --Yair rand (talk) 16:45, 11 October 2018 (UTC)

Calandrinia corrigioloides[edit]

This discussion on User:Peter coxhead's English Wikipedia talk page, is pertinent to issues with Wikidata's handling of items about species. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:14, 11 October 2018 (UTC)

a way to protect?[edit]

I have Romeo and Juliet (Q83186) on my watchlist, and I can easily remove it so feel free to tell me NO.

Everyday or at least often, some IP makes a wrong edit to it, one edit only. Can it be protected because of "stupid uncalled for gaming" or whatever?--RaboKarbakian (talk) 16:14, 11 October 2018 (UTC)

Looks like 6 edits in the last 2 weeks, but not too frequent before that. You might want to request this on the Wikidata:Administrators' noticeboard. ArthurPSmith (talk) 18:25, 11 October 2018 (UTC)
I've protected it for a month. If it's still attracting vandalism after that, let us know on the page ArthurPSmith linked above. - Nikki (talk) 18:48, 11 October 2018 (UTC)
The main page highlights Romeo and Juliet and the "Discover" section, but doesn't link to the item directly. Could this be the cause somehow? --Yair rand (talk) 20:30, 11 October 2018 (UTC)

Political movement or ideology[edit]

There is question among me and @Fnielsen: about Lars Hedegaard (Q1806231), who belong to the Counterjihad (Q3374768) (or Counter-jihad movement, or CJM). The CJM is currently a political movement (Q2738074) in Wikidata, and is described by scholars (see Q3374768#P1343) as a loose network of peoples and organisations. Is the CJM an organization (Q43229), a political movement (Q2738074) or an political ideology (Q14934048) ? Main question: how do Wikidata say that somebody belongs to the CJM: member of (P463)? political ideology (P1142)? else? Visite fortuitement prolongée (talk) 16:48, 11 October 2018 (UTC)

This problem concerns not only persons but also organizations such as International Free Press Society (Q4354370). To complication matters further there is also the possibility to invoke affiliation (P1416) (beyond member of (P463) and political ideology (P1142)). — Finn Årup Nielsen (fnielsen) (talk) 17:02, 11 October 2018 (UTC)
Both affiliation (P1416) and member of (P463) seem to be for organisations, and Counterjihad (Q3374768) doesn't seem to be an organisation. I suppose it's correctly a political movement (Q2738074) and political ideology (P1142) should be used. I'm not sure what the difference is between political movement (Q2738074) and political ideology (Q14934048). Ghouston (talk) 19:55, 11 October 2018 (UTC)
There may be a distinction between political movement (Q2738074) and political ideology (Q14934048) (they both have articles on cswiki), in which case using political ideology (P1142) with a political movement (Q2738074) could be a mistake. I don't know how you decide whether a particular instance such as Counterjihad (Q3374768) is one or the other, or what property you should use with a political movement (Q2738074). Ghouston (talk) 20:53, 11 October 2018 (UTC)
There are a lot of small local political movements about things like preventing high-rise buildings or giving tenants the right to own pets which can't really be described as ideologies. An ideology is some kind of grand scheme about how politics or society should be organized, such as communism (Q6186) or conservatism (Q7169). Presumably political ideologies will also be political movements, or have associated political movements. In the past I suggested (in jest, but maybe it's not a bad idea) properties like "supporter of" and "opponent of" to allow recording a person or organization's opinions about random things. That discussion was about atheism (Q7066) and whether or not it's a religion, but it could even be used to record a sports team somebody supported. Ghouston (talk) 21:06, 11 October 2018 (UTC)

Items created unintentionally twice on several occasions[edit]

Hello I noticed twice the following behaviour:

When creating an new item by clicking in Commons in the sidebar on the button 'In Wikipedia Add liniks' and although i just clicked once there were 2 items created. This happens every now and again without that I would be able to tell which circumstances lead to this. Moreover the 2 items created have each time subsequent item numbers. The last time i noticed this was earlier today:

You will as well notice that both items where created in the same second and so I think I can exclude that I made 2 actions.

the same happened as well a week ago:

and as well a few minutes earlier here:

and the day before:

as well as twice here:

as well a week earlier:

and the first time here:

Is there anyone who has an exèlamation fo rthis behaviour? I suggest to create an issue in Phabricator in order to allow deeper investiagtion.

Many thanks for any feedback on this issue. Robby (talk) 22:13, 11 October 2018 (UTC)

Many tools use SPARQL queries for duplicate detection, and when there is a lag on the server used by such a query, then duplicates may arise. Such lags have occurred multiple times during the relevant period, and I am not sure what causes them. --Daniel Mietchen (talk) 01:35, 12 October 2018 (UTC)
Particularly worrying is the fact that the sitelinks are present *both* in Category:1990s in the Balearic Islands (Q57215615) and no label (Q57215616) - something I thought was impossible, as sitelinks are supposed to be unique. Pinging @Lydia Pintscher (WMDE). Thanks. Mike Peel (talk) 06:32, 12 October 2018 (UTC)
These are so-called true duplicates. --Pasleim (talk) 08:57, 12 October 2018 (UTC)
Any suggestions on how to proceed with this?
  • Create an new Phabricator ticket or is there an open issue on this in Phabricator?
  • Update the list on true duplicates or first establish a complete list with all these true duplicates (my knowledge of SPARQL does not allow me to create a request to generate such a list (I do not even know whether this would be possible) ?
  • merge the duplicates and take no further action?
Thanks for further feedback and/or proposals Robby (talk) 21:13, 12 October 2018 (UTC)
It looks like the developers will run a script to update the list soon, see Wikidata:Contact_the_development_team#True_duplicates_clean_up?, and then we can merge them. There's a Phabricator ticket linked from there too. - Nikki (talk) 12:29, 14 October 2018 (UTC)
thanks for this update. I've added a comment in the corresponding ticket in phabricator. Robby (talk) 13:44, 15 October 2018 (UTC)

Why did Donna Strickland (Q56855591) not have a Wikidata item when her Nobel Prize was announced?[edit]

Over on the English Wikipedia, there is an insightful essay about the mechanisms that led to her not having an article there, but no Wikipedia had an entry, and neither had Wikidata, so I am wondering what we can learn from that for Wikidata-related workflows. --Daniel Mietchen (talk) 01:28, 12 October 2018 (UTC)

Like many authors of academic papers, we had items about her papers, with her name as a author name string (P2093) value (see the histories of Q33306866, Q33335086, Q35516328, Q36021270, for example) but our automated tools had, presumably, not been able to create an item about her, as her ORCID profile has "No public information available". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:22, 12 October 2018 (UTC)
Lile ORCID, which is as trustable as IMDB, is the only criteria... Sjoerd de Bruin (talk) 23:59, 12 October 2018 (UTC)
"Lile"? ORCID iDs are certainly more trustable than IMDB, in this context. I said nothing to claim that they are "the only criteria [sic]", but it they are the primary ID used by our automated tools to create such items. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:35, 13 October 2018 (UTC)
  • Maybe she was probably just younger than the usual winners? --- Jura 16:38, 12 October 2018 (UTC)
  • Two further factors, besides presumably gender bias: (1) her unusual decision to never have applied for a full professorship and (2) the lack of a Wikipedia article in no small part because of what I would consider an over-strict application of the idea of "independent" sources, where we don't consider an accredited university "independent" enough to be citeable about their own faculty. - Jmabel (talk) 19:59, 12 October 2018 (UTC)
  • Much of Wikidata is created from pre-existing databases. The unfortunate consequence of this is that any bias in the source databases will be reflected in Wikidata. Where gender bias exists in the source databases, it takes proactive editing to overcome that bias. I have repeatedly been shocked to find that novels by leading women--even Pulitzer winners--have not had Wikidata entries, but on searching further I have often found that these same novels were also absent from VIAF and the Library of Congress. So unless Wikidata editors are actively seeking to counter pre-existing bias, we can expect the default to be a reflection of whatever bias was present in our sources. --EncycloPetey (talk) 17:44, 13 October 2018 (UTC)
  • From a knowledge perspective, I think the Wikidata situation of Q905436 in 2017 or even now is more problematic. --- Jura 17:49, 13 October 2018 (UTC)
    It's not much worse than stable marriage problem (Q620702), which is another Nobel winner topic. --EncycloPetey (talk) 17:53, 13 October 2018 (UTC)
    If they were done correctly, the other questions probably wouldn't arise. --- Jura 12:55, 14 October 2018 (UTC)

Data loss during the data centre switchover[edit]

Hello all,

During the data centre switchover routine on October 10th, some unexpected problems occurred over the past days:

  • For a few hours, a small part of the data was not accessible. Some items and lexemes seemed to have disappeared.
  • Some data may have been lost, including edits, preferences changed as well as user accounts created during a period or about 50 minutes (from 2018-09-13 09:08:17 UTC to 2018-09-13 09:58:26).

Part of the data has already been restored (edits and revisions. The rest (user accounts, preferences) will be restored at the beginning of next week.

If you edited Wikidata on September 13th, please check your contributions. If you encounter any problem in the next days, like items not reappearing or something missing, let me know.

If you're interested in technical details, you can have a look at the Phabricator ticket. Thanks for your understanding, Lea Lacroix (WMDE) (talk) 14:45, 12 October 2018 (UTC)

Multiple sandboxes for training[edit]

I'm going to be delivering a couple of training sessions on Wikidata in the near future: October 20 in Cambridge, and October 25 for Coventry University. I often do a brief introduction to Wikidata when doing Wikipedia training, but these two events will focus on Wikidata. I always prefer to have participants actively trying out editing, rather than just hearing about it, so my usual teaching involves them working in their sandboxes for Wikipedia training. I would very much like to develop active participation in these events, but on Wikidata, users don't have user-sandboxes in their user space. I'd therefore like them to each create one or more "sandbox-items" in mainspace that they can then edit for practice. For example: user:Fred may create Sandbox-Fred Coventry and make test edits there as if it were the Coventry Wikidata entry, without disturbing the real item. Each participant would have their own item(s) to avoid the confusion of multiple people editing the same item, as they would if we used Wikidata Sandbox (Q4115189). After the event, I would request deletion of all the "Sandbox-" items.

Now, would there be any objections to this scheme? Are there issues or complications that I haven't foreseen? Does someone know of a better way? I'm interested in any and all feedback. Cheers --RexxS (talk) 16:18, 12 October 2018 (UTC)

  • We do have three sandboxes. Maybe we could have a few more. The advantage them being stable is that users don't get confused about the edits. Alternatively, maybe we could generate lists of potential items that could easily created by new users. --- Jura 16:28, 12 October 2018 (UTC)
    Thank you, Jura. I'd be really interested in generators for lists of potential items for creation by new users, but I'd want that as an addition for the second lesson, after participants are comfortable with the interface. Cheers --RexxS (talk) 16:42, 12 October 2018 (UTC)
  • This seems reasonable, but do you have a mechanism to clearly identify these sandbox items, so they don't cause trouble later? Maybe encourage everybody to make them instance of (P31) Wikidata Sandbox (Q4115189), or add that when you become aware of one? ArthurPSmith (talk) 18:52, 12 October 2018 (UTC)
    @ArthurPSmith: Hopefully, I'll have three mechanisms:
    1. I intend that each item should take the form <"Sandbox-"><name-of-user><" "><name-of-item>. Currently no item begins with "Sandbox-", so they should be easy to track down afterwards.
    2. I will be asking the participants to post a message on my talk page (to experience talk page conversations). That will allow me to scan their contributions and I'll quickly spot any edits to items not fitting the pattern I prescribed.
    3. I usually have some of the session time where participants work freely on a topic that interests them, and I work my way round everybody, helping out and checking on what they are doing. That tends to be a safety-net where I can spot those who have problems (and they are the ones most likely to create items outside of the pattern).
    It's not foolproof, but hopefully will avoid leaving things for others to have to clean up. --RexxS (talk) 20:03, 12 October 2018 (UTC)
    • Symbol oppose vote.svg Oppose as far as I'm concerned. --- Jura 20:08, 12 October 2018 (UTC)
      Care to elaborate? --RexxS (talk) 20:47, 12 October 2018 (UTC)
      • If you occasionally create test items with random statements, any Wikidata user who queries the database can end up getting them in their results and polluting them. In addition to my first suggestion, you could also create random items at https://test.wikidata.org --- Jura 20:54, 12 October 2018 (UTC)
        I can see the issue with potential result pollution, even for a brief session, so I'm sympathetic to that point. I agree that it wouldn't scale well. I've always shied away from using test.wikidata.org because of the possibility that the interface may be altered significantly by testing, but it looks quite comparable right now. I don't know anybody who has successfully used it for classroom work, but I think it's definitely worth trying out for at least one event. Thanks for the tip. --RexxS (talk) 22:32, 12 October 2018 (UTC)
  • I don't see an issue since you plan to RFD them after the event. I'm not sure I'd be particularly bothered without that caveat either, but I haven't thought it all through. --Izno (talk) 20:51, 12 October 2018 (UTC)
  • If the existing sandboxes aren't enough, then I think https://test.wikidata.org/ would be better, because people can happily test without messing up the live data and without having to worry about being reverted or even blocked or the items being deleted while they're trying to test. - Nikki (talk) 09:30, 13 October 2018 (UTC)
  • I don't see a good reason why you have to train with fake data. If you take real data that currently missing in Wikidata it would be more motivating to the people in your classroom.
You might for example take books that are currently in Wikidata that are only tagged as books and that have ISBN numbers and then ask your students to fill the items with more true information. ChristianKl❫ 15:03, 13 October 2018 (UTC)
  • For test.wikidata.org , you might need to define some items or properties before the course. (Maybe the bot running there could be configured to do that on each reset).
    The idea of looking for items with few statements seems like a good one as well, e.g. http://petscan.wmflabs.org/?psid=6118865 . You could also use categories with articles that need items. --- Jura 14:15, 14 October 2018 (UTC)

Author Qualifiers[edit]

In the Item - Q57077013 , I would like to use qualifiers to distinguish Coordinating Lead Authors, Lead Authors & Review Editors.

Are the properties object has role (P3831) & subject has role (P2868) the right qualifiers?

Or,

should I be using another property more suited for authors, Any suggestions?  – The preceding unsigned comment was added by Wallacegromit1 (talk • contribs).

subject has role (P2868) is correct, object has role (P3831) as qualfier of author name string (P2093) or author (P50) is wrong because the report is the subjejct and the authors are the objects. --Pasleim (talk) 09:13, 13 October 2018 (UTC)

Thanks for your quick feedbacks. But, When I add object has role (P3831), a message pops up saying;

"object has role is not a valid qualifier for author name string (P2093) – the only valid qualifiers are: series ordinal, of, subject has role, or, affiliation"

A bit more clarity, please!

. --- Wallacegromit1


  • It seems it was mixed up on Property:P2093 as well. I corrected that, but some 150 items need fixing too. --- Jura 17:31, 13 October 2018 (UTC)

Several "Wikilinkproblems" moved from talk page[edit]

Can someone wikilink en:Babić (Q20519335) with de:Babić (Q797730) ? 178.3.19.113 17:14, 10 October 2018 (UTC)

Can someone wikilink en:Čović (Q21487827) with de:Čović (Q341615) ? 178.3.19.113 17:17, 10 October 2018 (UTC)

Can someone wikilink en:Ivanić (Q21513651) with de:Ivanić ((Q16849113)) ? --178.3.19.113 17:20, 10 October 2018 (UTC)

Can someone wikilink en:Izetbegović (Q56538786) with de:Izetbegović ((Q1255192)) ? 178.3.19.113 17:23, 10 October 2018 (UTC)

Who deleted all the interwikilinks of Bosnian politican surnames ? --178.3.19.113 17:23, 10 October 2018 (UTC)

Can someone wikilink en:Ivanović (Q21507848) with de:Ivanović (Q526937) ? --178.3.19.113 17:27, 10 October 2018 (UTC)

Moved by Liuxinyu970226 (talk) 03:17, 13 October 2018 (UTC)

 Not done family name has to use a different item than disambiguation pages (Q27924673) --Liuxinyu970226 (talk) 03:18, 13 October 2018 (UTC)
We definitely need some way to link items of these two types to each other; something akin to is a list of (P360). Suggestions?Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:30, 13 October 2018 (UTC)

Ethnic and religious composition of human settlement[edit]

How to store data about ethnic and religious composition of human settlement (Q486972) in many different point in time (Q186408), if I know the quantity of representatives of particular ethnic group (Q41710) or religious organization (Q1530022) in different point in time (Q186408)? - Kareyac (talk) 14:02, 13 October 2018 (UTC)

You could use Wikimedia Commons and Data namespace. strakhov (talk) 18:16, 13 October 2018 (UTC)
@strakhov. Thank you, choosing the desition I’ll consider your proposal. - Kareyac (talk) 08:48, 14 October 2018 (UTC)

Double[edit]

Q12410695 was a double of Q7136547. I merged it there. Can somebody please delete Q12410695? Debresser (talk) 16:34, 13 October 2018 (UTC)

Please merge items with the instructions listed on Help:Merge next time. I've turned the former item into a redirect. Sjoerd de Bruin (talk) 17:16, 13 October 2018 (UTC)

A persons full or official name[edit]

Full name (Passport name). There are several ways to indicate a persons name. But I can not find a simple way to indicate a persons full name. An item usually uses the name a person is known under and most of the time equal to the article name on wikipedia. Like for Betty Ford (Q213122) who has the full name Elizabeth Anne Ford, probably used as her name in official documents and Passport. I am not able to find a property reflecting this I can find birth name (P1477) but for Betty Ford the birth name is Elizabeth Anne Bloomer and her {{P|P2562}} and {{P|P1559}} will be Betty Ford or derivatives of that with Ford as Family name. Is there a need for a property reflecting a persons full name? For females who have changed their last name when married I can find a property for a persons full or official name usefull. Pmt (talk) 22:06, 13 October 2018 (UTC)

  • There is P1448 for this ("official name"). --- Jura 22:33, 13 October 2018 (UTC)

Quality of a source needs to be documented and communicated[edit]

I am comparing the data of Nobelprize.org and Wikidata see Listeria list and run into problems what sources to trust

Suggestion is that we formalize how WIkidata document an used source (or start with sources that are WD properties):

  1. we start add some quality guidance of what type of source this is e.g
    1. Basic facts
      1. Number of people working with this source (fulltime/community...)
      2. When the organisation maintaining it was started
      3. If they have people full time hired
      4. If they have a documented quality process
      5. If they have controlled changed management
        1. Do you get a ticket when reporting problems and how do you report problems
        2. Can you review older versions of the source
        3. Is a reported change request possible to track
        4. Is this source Data driven
          1. Can we access it using SPARQL and federated search
          2. Can we access primary sources used
          3. Can we access a digital version of primary sources used
          4. Do they have external identifiers (en:Linked data) compare 5stardata.info
          5. Do they have Wikidata as same as property or are WD and the external source "share" an external property like VIAF ID (P214)
    2. Quality reviews of experts
      1. e.g. is this sources used by documented authorities in the field
    3. Quality reviews of Wikipedians
      1. Comments from people who has used this source what are the experiencies
      2. Where to find reported problems/mismatches
    4.  ???

- Salgo60 (talk) 05:27, 14 October 2018 (UTC)

  • Much of this has little to do with the quality of a source. For example, a database maintained by one reputable academic about an area in which he or she is expert can be fine; a database maintained by a group of followers of Lyndon LaRouche, or left as a legacy from the Stalin-era Soviet Union or Nazi Germany is inherently suspect no matter how formally proper it might seen and no matter the formal credentials of its participants. - Jmabel (talk) 01:38, 15 October 2018 (UTC)
@Jmabel: It has if you dont try to write something down then you dont communicate your opinion or your understanding. I can see a bigger need to document quality of surces
  • with the increasing size of the Wikidata project
  • the project is getting more and more Global
  • number of added properties is "exploding" and it is more difficult to understand the value/quality of a source
I did a test (in Swedish) to document one of the best sources in Sweden Dictionary of Swedish National Biography (P3217) a source that 99% of the people who has study history at the Swedish University trust but I guess nearly no one outside Sweden knows about see link some facts
- Salgo60 (talk) 10:39, 15 October 2018 (UTC)
I agree that there is a need for that. I don't know how to realize it. --Marsupium (talk) 20:55, 16 October 2018 (UTC)
@Marsupium: Better something than nothing?!?! Cant we start with something like featured article (Q17437796) but for a source... when you start look at historical famous people in Sweden (e.g. Selma Lagerlöf (Q44519) Q44519#P569) we are getting +10 sources indicating a birth date ==> would be nice to filter/order them on trusted sources - Salgo60 (talk) 07:30, 17 October 2018 (UTC)
@Salgo60: Fine, here you my brainstorming now ;):
I really feel the same and Q44519#P569 is a good example. I think we should even partly remove sources. A short previous related discussion is Wikidata:Project chat/Archive/2017/10#Redundant Wikipedia citations. But it is a hard task. I agree with Jmabel that most of the above mentioned doesn't say much about quality of a source (some we have already established ways to indicate, e.g. SPARQL endpoint).
Though as a small start: In the example Q44519#P569 I guess for an algorithmic evaluation https://web.archive.org/web/20160401152316/http://jeugdliteratuur.org/auteurs/selma-lagerlof should ranked lowest, a bare URL without author (P50) or other subproperty of (P1647) of creator (P170), publisher (P123) or anything else, then Find a Grave (Q63056) (and some of the others?) as a crowdsourced source. Perhaps we should state Find a Grave (Q63056)instance of (P31)  "crowdsourced work" (no item for that yet). Then it gets difficult. More criteria would be if a source is peer-reviewed.
For "Where to find reported problems/mismatches" deprecated statements referenced with a source and their proportion of all statements can be counted. Also the uses of Template:External reference error reports can help, also with "Do you get a ticket when reporting problems and how do you report problems". We should get more of that in the main Wikidata database to enable querying that information. Also bug tracking system (P1401) is somehow related.
BTW: Do you have a specific use case for this? I think it might be good to try to handle a specific use to figure out how to deal with this in general. --Marsupium (talk) 10:00, 17 October 2018 (UTC)
@Marsupium: the user case I have is a federated search comparing WIkidata and Nobelprize.org in a Listeria list link
  • Lesson learned
    • Wikidata is excellent in fast getting the correct death date
    • As a Nobel prize is global I fast run into the problem of finding sources you dont know anything about
    • My dream scenario is that Wikidata produce lists like above indicating a difference AND that we also present the best sources confirming the facts
  • Rank sources
    • it's difficult but for Q44519#P569 we have the church books i.e. en:Primary Sources in electronic form link telling that Selma Olivia Lovisa is born..... if you can read that source then that is what you trust... we als have one of the best ranked sources (in Swedish) link ==> its a secondary source but written by professionals using primary sources
- Salgo60 (talk) 10:24, 17 October 2018 (UTC)

Denmark[edit]

Is it possible to solve the problem which state (Q35 (Denmark) or Q756617 (Kingdom of Denmark)) Danish locations and Greenland locations belong to? There cannot be two states. Other properties are affected to this problem, too. I do not understand the difference between both. There is no English article on Q756617. --RolandUnger (talk) 09:13, 14 October 2018 (UTC)

For a start, I think en:Denmark is about Kingdom of Denmark (Q756617), but has been incorrectly connected to Denmark (Q35). I don't know much about Denmark, but I think it's something similar to Kingdom of the Netherlands (Q29999) and its constituent countries, which can also confuse people in Wikidata. Ghouston (talk) 10:13, 14 October 2018 (UTC)
Kingdom of Denmark is the official name of Denmark, so it should be the same. At least, we need a unique value for country (P17). If Denmark is a state then Kingdom of Denmark is maybe only a name of a state but not the state itself. --RolandUnger (talk) 11:06, 14 October 2018 (UTC)
No, I believe Kingdom of Denmark includes Denmark proper, Faroe Islands, and Greenland. For example, Denmark is part of the EU, whereas Faros Islands and Grenland (and, consequently, the Kingdom of Denmark) are not.--Ymblanter (talk) 12:10, 14 October 2018 (UTC)
Something similar exists between Netherlands (Q55) and Kingdom of the Netherlands (Q29999). However, it's complicated by the fact that the Caribbean Netherlands (Q27561) are part of both, while Curaçao (Q25279) only belongs to Kingdom of the Netherlands (Q29999). —Rua (mew) 12:37, 14 October 2018 (UTC)
We had the following request some time ago: Wikidata:Bot_requests/Archive/2016/12#Country_→_Denmark --- Jura 12:46, 14 October 2018 (UTC)
But I think the request was discussed but not executed. So we have the situation that cities like Copenhagen (Q1748) are situated in the state of Denmark (Q35) and cities in Greenland (Q223) are situated in state of Kingdom of Denmark (Q756617) but these are the same country! And if you see the data sets of both Denmarks then you can learn that all is completely mixed in the Wikipedias. Normally we use country names like Germany (Q183) without political description (official name is Federal Republic of Germany). Other countries like France, Netherlands have overseas territories, too but there are only one state. I think Danish authors should help to remove the confusion. --RolandUnger (talk) 16:41, 14 October 2018 (UTC)
  • A possible way to resolve it would be to unify the interpretation of country of citizenship (P27) and country (P17): if a "country" is part of a larger state, but doesn't have it's own citizenship, then don't allow it as a country (P17) target. That would mean changing quite a few statements for the Netherlands, but would avoid changing a lot for the UK. I can't see any indication either that Greenland (Q223) and Faroe Islands (Q4628) have separate citizenships. I think it would make sense, since a state shouldn't be treated differently in Wikidata depending on whether it calls its internal subdivisions "countries" or "states". Ghouston (talk) 20:53, 14 October 2018 (UTC)
    • Another option which doesn't require changing the Netherlands or the UK would be to allow things which have their own ISO 3166-1 codes. It's a widely-used international standard, so it seems reasonable to say that the things it lists are often considered to be countries. - Nikki (talk) 21:29, 14 October 2018 (UTC)
      • But then we'd have different definitions of country for country of citizenship (P27) and country (P17). ISO 3166-1 would also make places like Hong Kong, Macao and "United States Minor Outlying Islands" into countries. Ghouston (talk) 22:43, 14 October 2018 (UTC)
        • I don't think that's a problem. They're different properties, they can have different allowed values (and these are not the only two country properties we have, see country for sport (P1532) for example, a single set of allowed values for all country properties is not possible). country (P17) has a wide variety of uses so I don't think it makes sense to require the narrow definition that makes sense for country of citizenship (P27). - Nikki (talk) 10:37, 15 October 2018 (UTC)
          • Sure, but I don't see what's to be gained in this case by treating the Netherlands and Denmark differently to other states (or are there other exceptions too?). Many of them have regional parliaments and semi-autonomous areas. If "Kingdom of the Netherlands" was given an alias "Netherlands", and the existing statements assigned to it, then presumably it would appear first when selecting values and there'd be less confusion. The only advantage I can see in treating the Netherlands and Denmark differently to say the UK is that it's sort of the status-quo, and maintaining the status-quo seems to be an incredibly powerful force on Wikidata. I'm not sure why that is, for such a relatively new project. Ghouston (talk) 00:02, 16 October 2018 (UTC)

Soundex formatter URL[edit]

The formatter URL on Soundex (P3878) is https://www.wikidata.org/wiki/Special:Search?search=haswbstatement%3A"P3878%3D$1" . Is that correct? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:19, 14 October 2018 (UTC)

Seems useful and better than nothing. @Jura1: let me mention as you haven't been mentioned. Sjoerd de Bruin (talk) 12:41, 14 October 2018 (UTC)
I'm unclear how providing such a link to one of our data consumers is "Useful"; please can you enlighten me? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:22, 14 October 2018 (UTC)
Can you enlighten us what "consumer" you have in mind? --- Jura 13:34, 14 October 2018 (UTC)
Any. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:18, 14 October 2018 (UTC)
Can we have a sample use case? --- Jura 14:20, 14 October 2018 (UTC)
I hope so - that's what I'm asking for; the use case for providing this URL to data consumers. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:24, 14 October 2018 (UTC)
I don't know. You were talking about data consumers. Maybe you can enlighten us how they get it and what you have in mind. --- Jura 14:28, 14 October 2018 (UTC)

There being no use case for this formatter URL, I propose to remove it. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:12, 15 October 2018 (UTC)

  • I think you omitted to answer the question above. Do I need to translate it in some other language? --- Jura 17:14, 15 October 2018 (UTC)

way of itemstructure[edit]

I'm little in hang of idea: How to say, that Walther Freise (Q57313462) is chairperson (Q140686) of Naturforschende Gesellschaft der Oberlausitz (Q1970282) inclusive dates in the right way (did it via part of (P361)). Thank you for help, thank you very much for your work, Conny (talk) 13:28, 14 October 2018 (UTC).

Solar Hijri Calendar[edit]

Hi, "time" data type does not support SH calendar. We don't use Gregorian calendar so we don't convert dates from SH to AD. For example when I want to edit date (property: inception) for item Q4819111 I don't know exact equivalent in AD calendar. How can I edit in my own language? --دوستدار ایران بزرگ (talk) 21:27, 14 October 2018 (UTC)

Wikidata does not yet support calendars other than the Gregorian and Julian calendars. There are some Phabricator tasks open for expanding this: See for example this task for Hebrew calendar support. You can open a similar task for Solar Hijri support on Phabricator. --Yair rand (talk) 23:05, 14 October 2018 (UTC)
✓ Done Thanks I made a thread in Phabricator. --دوستدار ایران بزرگ (talk) 06:32, 15 October 2018 (UTC)

Countries and their subdivisions and territory[edit]

The question of which country a region or subdivision is part of can sometimes be quite complicated, and our current handling of this is very imprecise and occasionally inconsistent. I'd like to establish some standards for this and add them to the relevant documentation. To summarize some of the relevant things to take into account:

  • Some items correspond to regions: geographical features (islands, peninsulas, etc) or other areas which don't exist as a part of any county's set of subdivisions, but the territory of which is within one or more countries. Some items correspond to subdivisions, which may exist within the set of subdivisions used by only one country among a set of countries that the corresponding territory may be located in, or the same subdivision may be used by multiple countries that the area may be in. (That is, there are occasionally competing subdivision structures over the same area, with different items, and sometimes not.)
  • Territories can be controlled and administered by a country; a particular country can be "the government" over an area, in practice. Regular administration can be military or civilian. Under certain circumstances, control may be exercised by an alliance of countries, sometimes working under an international organization. Occasionally, military control and day-to-day civil administration of the same area may be run by different countries.
  • A country can claim a territory as their own. This can be done with or without claiming the territory as part of the country proper. Disputed claims between countries can be further complicated by the fact that a local government of a subdivision can have its own opinion, and claim itself to be part of a particular country.
  • A territory can be internationally recognized as being part of a country. Any country or group of countries can recognize an area as belonging to a country. (I'm unsure to what extant this existed before modern history.) There can be hundreds of different countries expressing opinions on this, so presumably we don't want to duplicate the whole list many times.

The existing properties used in this area are: country (P17), contains administrative territorial entity (P150), located in the administrative territorial entity (P131), and territory claimed by (P1336). (Also somewhat relevant are coextensive with (P3403) and territory overlaps (P3179), which manage the relationship between regions and subdivisions of the same area.)

Using the existing properties, along with possibly new properties or qualifiers, we should clarify how these properties should be used to express the above information on regions and subdivisions, keeping several priorities in mind:

  • Specificity: Items should ideally include as much of the data as possible, unambiguously as possible. Users should be able to query any specific element of association between a territory or subdivision and a country.
  • Simplicity: Editors should be able to easily figure out how to format a statement for any such kind of relation, without having to read endless complicated documentation.
  • Minimalism: Most areas are undisputed, and are claimed by, controlled by, administered by, and recognized as being part of, only one country. We should try to minimize the number of extra statements so that we don't need to add extras to every such ordinary situation.

Essentially all of the ways a subdivision/area can be connected to a country are not dependent on each other, making everything rather complicated. To take a fictionalized example, in case this helps:

The government of Q01 lists among its administrative subdivisions the Q02, which corresponds to the territory/area of the geographic feature Q03, which is claimed by Q04 as subdivision Q05, and administered by Q06 (as a different type of subdivision, which has a separate item Q07). The local government of Q02 itself considers itself to be part of Q07. The international community considers the area to be part of Q08 and recognizes it as such.
What statements should be used on each of these items to convey this information?

How should this data be ideally structured? --Yair rand (talk) 23:34, 14 October 2018 (UTC)

Wikidata:WikiProject Country subdivision has some pre-existing work in this area, but it seems to have been largely dormant for quite a while. There's still a lot of even quite basic work to do in this area (e.g. how much is missing or red on Wikidata:WikiProject Country subdivision/Items, even in terms of the first-level administrative country subdivision (Q10864048) in a lot of countries) so making sure that the underlying concepts are in place and well-documented so that people can help out by filling in a lot of those gaps, would be very useful. --Oravrattas (talk) 14:00, 16 October 2018 (UTC)
I've linked to this thread from the WikiProject talk page. --Yair rand (talk) 20:51, 16 October 2018 (UTC)

Taxonomy: concept centric vs name centric[edit]

Hi there,
I'm a new editor to Wikidata but I'm already an extensive consumer of data by Wikidata using the SPARQL interface.
During the last year, there's been one point that's really bothered me. My plan is to fix it, first by hand, later by bot. However, already my first edits were reverted, so maybe I have first to reach out.
The point is that a concept with multiple names gets modelled by one item. The exception to that are taxa which have one item per scientific name. Thus, a taxa with multiple scientific names has multiple items.
This generates a few problems:

  1. For data consumer (and likely new editors) it's difficult to understand the data structure if multiple ways of modelling are used across Wikidata.
  2. Links to Wikipedia are split; if different language versions of Wikipedia describe the same taxon under different names they are not linked to each other.
  3. Data is added multiple items. For example "Image" or "taxon common name" are added on each item about the same taxon.

How can we resolve the problem? My solution is to merge items about the same taxon and use ranks and qualifiers to indicate which statements are outdated resp. preferred. 130.92.255.36 07:56, 15 October 2018 (UTC)

99of9
Abbe98
Achim Raschka (talk)
Brya (talk)
Dan Koehl (talk)
Daniel Mietchen (talk)
Faendalimas
FelixReimann (talk)
Infovarius (talk)
Jean-Marc Vanel
Joel Sachs
Josve05a (talk)
Klortho (talk)
Lymantria (talk)
MargaretRDonald
Mellis (talk)
Michael Goodyear
MPF
Mr. Fulano (talk)
Nis Jørgensen
Peter Coxhead
PhiLiP
Andy Mabbett (talk)
Plantdrew
Prot D
pvmoutside
Rod Page
Soulkeeper (talk)
Strobilomyces (talk)
Tinm
Tom.Reding
Tommy Kronkvist (talk)
TomT0m
Tubezlob
RaboKarbakian
Circeus
Pictogram voting comment.svg Notified participants of WikiProject Taxonomy --- Jura 08:02, 15 October 2018 (UTC)

  • I added "taxonomy" to the section header above and pinged the relevant WikiProject. BTW it's not really specific to the field that links to Wikipedias don't follow Wikidata's structure. Merging items to connect random elements isn't really a good idea. --- Jura 08:01, 15 October 2018 (UTC)
    • I understand that connecting random elements isn't a good idea but my plan is to merge items describing exactly the same taxon, just known under different scientific names. As analogy, look at "Louis XIV of France". He is known under many different names, still only one item describes him. --130.92.255.36 08:20, 15 October 2018 (UTC)
Agree this is currently a major nuisance. As an example, the species formerly known as Madanga ruficollis (Q178830) was recently reclassified as Anthus ruficollis, but I know if I change the taxon name (P225) to that (apparently sacrosanct and "unchangeable", but nothing to indicate it, nor to make the change impossible), someone will just revert it. Currently, as far as I can see, it has to wait for someone to create a new item for it here, and transfer all the links across to it - very cumbersome! - MPF (talk) 08:41, 15 October 2018 (UTC)
  • In items here about taxa, the taxon name is leading. If an author publishes a paper to suggest that a species should be placed into a different genus - say that Madanga ruficollis has to be place in the genus Anthus like in the example above - it is a misunderstanding that the earlier name has been rejected or disappeared. First of all authors may have different opinions on the issue. But also the earlier name has been published (often many times, for instance on a Wikipedia). You will see that taxonomy publications on Anthus ruficollis will mention the earlier name Madanga ruficollis. The most correct way on Wikidata is to make a new item on the new name and keep the old one - linking them by taxon synonym (P1420). Of course the confusing is understandable, since the plants or animals at hand do not change by placement in a different genus. One might think that just changing taxon name (P225) would be sufficient. However, the placement of a species in a different genus reflects more than just renaming, it shows new insights and/or opinions in the placement of the species in the tree of life in relation to other species. The header "concept centric vs. name centric" hence is not reflecting correctly what is going on as the name reflects the concept and the taxonomical concept has indeed changed. For convenience reasons, sitelinks may be collected at one item. Lymantria (talk) 09:22, 15 October 2018 (UTC)
  • True, it's just that the proceedures for doing all this at wikidata are so obscure and impenetrable. And if changing P225 is not the way to do it, why is this not locked to make it impossible to do accidentally? Is there really no easier way? - MPF (talk) 09:45, 15 October 2018 (UTC)


  • I fully agree with the initial comment in this section and in fact because there is no agreed solution to this problem Wikidata is no longer of much interest to me. I am mostly concerned about fungi, which are normally known by their scientific names (at least in English), and those names are currently undergoing enormous numbers of changes with new genera constantly appearing. For these homotypic synonyms there is no controversy at all that the species are exactly equivalent (it is true that there also exist cases where there are conflicting views as to the exact equivalence and those cases are more complicated to handle, but they are much less frequent). We need Wikidata items for all the important synonyms, but one of the items should be selected to mean the real organism; the other items should be restricted to taxonomical information. The special "Wikidata" item should contain the wikilinks and all properties which belong to the organism (and which do not need to be duplicated for all the items). We should avoid language which implies that the selected "Wikidata" name is the right one and that the other names are wrong. Almost the only thing that Wikidata does for the Wikipedia projects at present is provide the interlanguage links, but the various language versions may naturally happen to use different names for the same species, so unless one special "Wikidata" item is chosen for all, the various language pages will not be correctly linked together.
I think the title "concept centric vs. name centric" reflects well the main issue here. It is true that the exact definitions of the various names are not identical, but I think the place to document that is in the Wikipedia article, of which there should probably be only one. All the descriptions are attempts to define the same actual species which exists in nature and if there are conflicting definitions, and someone wants to include that information in Wikidata, I think that it should be documented by using references and author attributes to multiple properties on the same main organism item.
I made a detailed proposal for a solution in the discussion of the synonym property, but unfortunately it was not agreed. I proposed that only the special item representing the organism should have instance of (P31) = taxon and the others should have instance of (P31) = synonym, but some other method could also be used. In difficult cases multiple special "Wikidata" items could be allowed. Some objections were raised but I think they have solutions and anyway it is a very high priority that Wikidata should be able to selected have organism-level items. Otherwise I think the Wikidata data model is failing to support "tree of life" information satisfactorily. Strobilomyces (talk) 10:00, 15 October 2018 (UTC)
    1. This is not really the proper venue for this discussion, which should be at Wikiproject Taxonomy.
    2. Without a detailed case-by-case study it is not really possible to tell if something is the same taxon. Taxa are not necessarily stable, and may be highly dynamic. For example, to most authors Pentaphylacaceae consists of one species, but there are those who consider it to comprise some five hundred species. In most cases, the only known way to track taxa is by scientific name (or sometimes clade names); there is, as yet, no way to track taxa by other means. So there would be nothing to base items on (and once it does become possible to have identifiers for all taxa ever recognized, it likely will prove that these are very, very many).
    3. It is easy to track scientific names. One name, one item makes it easy to gather database identifiers and references referring to that name. Data retrieval is also easy, as in the enwiki taxonbar (pointed out above).
    4. Something of a problem is the placement of the sitelinks. All the sitelinks of homotypic names can be placed together in one item (so as to have them linked), as long as not too much value is attached to what item they are in. Various other solutions have been proposed, but as yet nobody supported an approach suggested by any other user.
    5. Another problem is that from time to time, there are users who feel that there is, or should be, a single accepted name for each and every taxon. Besides not being in the spirit of the WMF, there is the slight problem that these users do not necessarily agree with each other, or with most of the world literature. - Brya (talk) 10:55, 15 October 2018 (UTC)
  • @Brya: Thanks for your answer.
1. I would cetainly be happy to have the discussion at Wikiproject Taxonomy.
2. In general it can be difficult to tell if two taxa are synonyms or not, but in many cases there is no doubt, for instance the fungus Marasmius alliaceus now has a new name Mycetinis alliaceus and it is absolutely clear that the two terms refer to exactly the same mushrooms, since the change was at the genus level. There are many hundreds of cases like that in the fungi and it would be useful to have a solution for those simple cases. I suppose that if there is doubt it may be necessary to keep the names separate but I think that the claim of synonymy should be with a reference and so if it is sometimes wrong, that does not mean Wikidata is wrong. Taxonomic databases often give synonym information and that is what I propose we should use to determine if taxa are the same. There will be difficult cases, but it is normal to have to take that that sort of decision in this context. Wikipedia pages can sometimes cover multiple closely related groups and we should not need a separate taxon item in Wikidata unless a separate Wikipedia page is necessary. It would be an immense service to users to provide items for particular organisms even if the meaning of the organism is a bit vague and it is necessary to read the Wikipedia article to understand exactly what the various names mean.
If I understand you right, your example is at the family level; in the past family Pentaphylacaceae consisted of only one species, but DNA work has shown that that species belongs in the clade of family Ternstroemiaceae and due to the priority rules the name of the combined family has to be "Pentaphylacaceae", so it has gained 500 species. By the way, the author string of the family does not change in this process! Well, I think my proposal needs to apply mostly at the species level, which I think is the "organism" level. Perhaps I should not try to apply it at the family level.
3. It is true that because of the strict nomenclature rules, names are easier to track than taxa, but it is a terrible disadvantage if we don't have Wikidata items which correspond to the real species (and perhaps other levels). I don't think that we should give up and I think we can track the taxa using the Wikipedias, once they are covered. I think it is OK if initially Wikidata is loaded with thousands of items at the name level from external databases, but in those cases where the organisms get articles in language Wikipedias or categories in Wikimedia Commons we should improve the quality of the data. Then I think that synonyms should be linked together and one "Wikidata" item should be selected, so that we will have an item at the organism level.
4. I am very glad to see you say "All the sitelinks of homotypic names can be placed together in one item". I have done this in certain cases, but I was worried that I was breaking some rule. I also agree that not much value should be attached to the question of which item they are in. If there are non-taxonomic properties (which belong to the organism), they should also be added in the item with the sitelinks. In fact this means that where an organism has pages in at least two Wikipedias, we do actually have a selected organism-level item. If this could just be agreed as a formal principle, I would be much happier.
5. Yes, it is a problem if users are intransigent about pursuing their particular taxonomy, and I think that is one reason why we need clear rules about how to deal with divergent taxonomies. Strobilomyces (talk) 15:11, 15 October 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── A pointer to this discussion has been posted at Wikiproject Taxonomy; and members of the project have been pinged here. This venue is fine.

If all the sitelinks of homotypic names are to be placed together in one item, then that item would need to have all those names as aliases; there has previously been strong resistance to using such aliases. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:40, 15 October 2018 (UTC)

That would only be the case if the intent is to cause maximum confusion. - Brya (talk) 17:00, 15 October 2018 (UTC)
  • Using the number of Wikipedia sitelinks as a primary criterion to define items is probably the most complicated approach. For Wikipedias, it doesn't really matter across the number of items information is spread: they can easily load from hundreds of structured items. Anyways, from an external standpoint, it doesn't really surprise me that a field called "-nomy" is name-centric. --- Jura 17:18, 15 October 2018 (UTC)

The Color of the Bikeshed[edit]

I had the same problem, actually, with magazines. Journals, if that is a more respectable name for ya. Bird-lore became Audubon Mag, or some such. A big difference between the two is the license to use them.

The early "find and identify" taxonomy, probably uses whatever reference material they had with them at the time. "Scientific Society" notes and inclusions, maybe a book; samples brought back home and identified with whatever reference material they had available.

150 years later, I find a species name here. None of the cited taxon groups within the species and none of the wikipedias use the name. I separated them, the interesting part was "who is calling this that". It got merged again and the one site that called it by the merge name was left off (another was found or created since then). I was treated as if I was the idiot.

Wikidata is not a taxon identifier authority. If there are two news agencies reporting a thing and they both claim that a different thing happened, this is not the place to determine the correctness of one or the other. So it is with taxon names.

If a user of the data doesn't like the complicatedness of the data, the user should be instructed, then, to go fix the "science" as it has been practiced or to find something less complicated to be involved in. I would like to call this my opinion, but it is not even that! It is the fact of taxonomy.--RaboKarbakian (talk) 16:30, 15 October 2018 (UTC)

User:RaboKarbakian - as far as I can tell you created an item for the same name (two items for one name), rather than an item for the homotypic synonym. [ Brya (talk) 17:00, 15 October 2018 (UTC) (- split comment)]

homotypic synonyms[edit]

User:Strobilomyces - putting all homotypic synonyms in one item superficially looks nice (and it might have worked if the basic structure of Wikidata had been different), but it will either lose information or be frighteningly complicated, with every statement accompanied by qualifiers indicating to which name it applies. This may even be the case for something as basic as taxon rank; no reason why there may not be homotypic synonyms in three or more different ranks. It will be terribly awkward to edit or to read.
        Properties of taxa should preferably be referenced and placed with the name in the reference. What applies to one particular circumscription may not apply to a different circumscription. And many different circumscriptions for one taxon is not limited to the family-level. What is one species to one taxonomist may well be hundreds of species to another taxonomist. A notorious case is the dandelion.
        And of course we would need new properties and a new taxobox module.
        Maybe somebody will come up with a piece of software to link sitelinks in different items (the Bonnie and Clyde issue is still there as well). - Brya (talk) 17:00, 15 October 2018 (UTC)
User:Brya I am not suggesting putting all homotypic synonyms in one item, I am suggesting being allowed to mark one of them as special and to use that special one for sitelinks and for other properties which belong to the organism. Indeed there are hometypic synonyms at different ranks, that is OK. Normally properties of taxa are the same for all homotypic synonyms; they tend to be very general (example: has fruit type (P4000)). If the circumscriptions have different organism properties, I think they are not homotypic and at least that is a very special case which does not occur in the present Wikidata and which could be treated exceptionally. By giving priority to these obscure cases we are losing the opportunity to have an organism-level item in Wikidata, which could really add value to the project. I don't think that the dandelion mini-species change anything here; we can represent alternative taxonomies using references, qualifiers etc. and the same decisions have to be made with or without this proposal.
The taxobox module refers to name-related information and I actually doubt that any change to that is needed. But we would need new properties. I would like to put forward another simplified version of my proposal:
  1. The current name-based data structure would stay as it is and we would only add information to it.
  2. It would be allowed to add the new property "organism item" to taxonomy items to indicate a chosen item for sitelinks. I am not proposing a change to the software, but in time we should update the data so that those items with sitelinks have the new property and where homotypic synonyms (indicated by taxon synonym (P1420) and instance of (P31) with of (P642)) have different sitelinks, one item should be chosen and the sitelink information merged. Perhaps there could be exceptions, but I think if the Wikipedia pages are not equivalent, the items should not be homotypic synonyms. The "organism item" could be useful for other purposes, such as keeping in one place information which is independent of name. Perhaps another method of labelling the special item would be better, for instance it could be an attribute of instance of (P31) = taxon.
  3. A restriction would be that Wikipedia pages should only link to items marked as "organism items". This would not be enforced by software, at least at first, but it would solve the problem that sitelinks for the same organism can be scattered amongst different names.
  4. Also a new property "Authority for current name" should be added to allow indication (with a reference) that one of the items is the "real" current name. Conflicting views could easily be accommodated. This is a somewhat different issue, but it would make clear that the current name may be different from the "organism item".
This proposal would solve the sitelinks problem and allow organism-level data to be added without losing any information. In order to be useful I think it would just need a formal proposition and a consensus that this is a useful addition to the taxonomy part of Wikidata. Strobilomyces (talk) 19:21, 15 October 2018 (UTC)
  1. This is a quite different solution than proposed by others in this thread.
  2. This proposal would not so much solve the sitelinks problem, as move the problem to a different level. It requires a "consensus taxonomy" which does not exist in reality. - Brya (talk) 03:22, 16 October 2018 (UTC)

Separate names from concepts?[edit]

Given the number of others who have jumped in... Perhaps it would simplify things for most purposes if the taxonomic name items were completely divorced from the items representing classes (i.e. any collective grouping) of organisms? We do have a brand new Lexeme namespace specifically designed to hold information about words and short phrases, their origins, and their meanings (linking those meanings to regular Q items where appropriate). Probably we're not ready to wholesale move those millions of Q items over to corresponding L entries, but supporting the distinction between words and their meanings seems at least a sensible start: let's create independent Q items, not in the "taxon" hierarchy, for the conceptual species, genera, families, etc at least where there are corresponding wikipedia pages, and just model them separately. ArthurPSmith (talk) 18:38, 15 October 2018 (UTC)

  • It might work for Common names (currently strings on some items), but I don't think it would solve it for the actual items. You'd still need to name them ;) --- Jura 18:44, 15 October 2018 (UTC)
  • I agree that a complete set of organism items separate from the name items would solve the theoretical problems nicely and achieve the modelling aims. But I have always assumed that this would not be acceptable as it would look like duplication, be confusing and difficult to explain, increase the amount of work in updating the data, and be a big change to the current system. But in my opinion the current system is not fit for purpose - we need taxon-level information. But we would have to have a very robust consensus in order to go in that direction. Strobilomyces (talk) 19:37, 15 October 2018 (UTC)
    • @Strobilomyces: One of the reasons I suggest it is that the taxon hierarchy has always been out of sync with the way we manage basic class membership relationships in the rest of wikidata via instance of (P31) and subclass of (P279) statements. The duplication issue I think can be explained simply by looking at those class relationship statements on the items - it's not uncommon in the rest of wikidata to have multiple items with the same name, distinguished mainly by their position in the class hierarchy. ArthurPSmith (talk) 20:40, 15 October 2018 (UTC)
  • I agree with the statemens of several others above. It would in some ways be best if only the currently valid name (zoological def) had a page and synonyms were referred to it. However, that may work at Wikipedia and its the preferred method at Wikispecies also. But Wikidata is attempting to database all terms. As stated earlier any synonym is still an available name (again zoological def) it has not been disposed of in any way and can at any time be resurrected. However some way of linking the synomyms would be a good idea, it would also be useful to identify if a name is the currently accepted name or combination. Cheers Scott Thomson (Faendalimas) talk 21:03, 15 October 2018 (UTC)
  • A separate name-space for taxon names is interesting in theory. It would presumably work well for species of birds. It may work for taxa where Wikipedias have genuine pages for taxa, that is at some length and in some detail. However, there are hundreds of thousands Wikipedia pages that don't have genuine information, but are based solely on the scientific name and taxonomic position ... - Brya (talk) 02:59, 16 October 2018 (UTC)
  • I have the impression that taxon items are supposed to represent the work of classification by a scientist, not the actual organisms involved. Since they are only "instance of taxon", and not subclasses of anything, they can't have instances. That means you need other items to represent that actual organisms, e.g., human (Q5) for humans. When you have an item like Wolf of Ansbach (Q39019), it presumably needs to be an instance of something other than Canis lupus (Q18498), which gives a constraint violation. A new "wolf" item, perhaps, or just assign it to animal (Q729), which does have a subclass? Ghouston (talk) 23:03, 16 October 2018 (UTC)
    @Ghouston: parent taxon (P171) is a subproperty of (P1647) subclass of (P279). --Yair rand (talk) 23:41, 16 October 2018 (UTC)
    Thanks, subproperties of the subclass property is a new one to me. The constraints don't know about it either. Ghouston (talk) 00:27, 17 October 2018 (UTC)
    I've adjusted the constraint on instance of (P31).--99of9 (talk) 01:50, 17 October 2018 (UTC)
    There already is a "wolf item", at Q3711329. - Brya (talk) 05:16, 17 October 2018 (UTC)

Conflated ranks[edit]

Utatsusaurus hataii (Q3053716) seems to conflate a genus and species. What's the best way to disentangle them? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:15, 16 October 2018 (UTC)

Leave it alone. Most fossile taxa are "monotypic". We are missing a lot of fossile type species. --Succu (talk) 21:58, 16 October 2018 (UTC)
Why would we "leave alone" an item which clearly conflates two distinct topics? What does other missing items have to do with this issue? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:45, 16 October 2018 (UTC)
@Pigsonthewing: It would seem to me that the genus and species for a monotypic genus both refer to the same thing. What am I missing? How are they different topics? - Jmabel (talk) 03:28, 17 October 2018 (UTC)
Actually they are not quite the same thing. However, the problem is not here but at the Wikipedias, so unless Wikipedias split these, this is the straightforward way to handle them. - Brya (talk) 05:12, 17 October 2018 (UTC)
A genus and a species are not „the same thing”. You need a species to „define” a genus. --Succu (talk) 06:12, 17 October 2018 (UTC)
agree thet are not the same thing, a genus is a group of species more similar to each other than to anything else. A species is a group of populations capable of successfully interbreeding and are more similar to each other than anything else. Being monotypic does not change the definition as it also includes the hypothetical unknown relative in the genus, ie undiscovered fossil history that can be inferred from relationships. Also known as ghost linneages. Cheers Scott Thomson (Faendalimas) talk 06:28, 17 October 2018 (UTC)

Amsterdam[edit]

Why are there two entries for Amsterdam in the Netherlands? There is no difference between Amsterdam as a capital and as a municipality. Both refer to the same administrative and legal body. It also confusing, as both contain now the same properties like population.

Amsterdam (Q727) capital and largest city of the Netherlands

Amsterdam (Q9899) municipality in the Netherlands, containing the city of Amsterdam  – The preceding unsigned comment was added by Historazor (talk • contribs) at 12:12, 15 October 2018‎ (UTC).

Not so. The municipality of Amsterdam has a couple of populated places apart from the city, for instance Holysloot (Q672125) and Ruigoord (Q3252907). Lymantria (talk) 13:42, 15 October 2018 (UTC)
And one is shown as located within the other. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:55, 15 October 2018 (UTC)
But even if they were coextensive with (P3403) to each other, a municipality and a city are not the same. Lymantria (talk) 19:20, 15 October 2018 (UTC)
"Municipality" also seems to be ambiguous: it can refer to either a territory administered by a local government body, or by the body itself. For some places in Wikidata these have separate items, e.g., London Borough of Camden (Q202088) and Camden London Borough Council (Q5025790), and other places they don't. It's not really clear if Amsterdam (Q9899) refers to only the administrative body, or the territory as well. Ghouston (talk) 04:11, 16 October 2018 (UTC)
Both. Lymantria (talk) 06:40, 16 October 2018 (UTC)

Wikidata weekly summary #334[edit]

Problems with sports league level (P3983)[edit]

MisterSynergy Thierry Caro &beer&love Vanbasten_23 Malore Pictogram voting comment.svg Notified participants of WikiProject Sports

I noted two problems about this property:

@Xaris333, Fundriver, Kooma: as main contributors --- Jura 17:28, 15 October 2018 (UTC)
Like
< Premier League (Q9448) View with Reasonator View with SQID > sport league system search < Q18559 >
sports league level (P3983) View with SQID < 1 >
? Xaris333 (talk) 21:04, 15 October 2018 (UTC)
@Xaris333:Yes, like that.--Malore (talk) 22:31, 15 October 2018 (UTC)
Hello,
well first off: The german description talks about sports in general, i think you should just correct the english description and you are all fine. Also the english description isn't really fitting the name of the Property.
To the suggestion: I wouldn't do this, because in history the the sports league level (P3983) of some specific league has often changed, because of the creation of a new league and therefor it should be possible to place a start- and enddate to the league niveau (even tho this didn't happened yet I think, but I could tell you some league, where it changed and would be appropriate to set a start- and enddate on the property).
Best regards,
Fundriver (talk) 13:35, 16 October 2018 (UTC)
@Fundriver: It's possible to have two different statements that differ only by qualifiers, something like that:
sport league system
Normal rank imaginary league system Arbcom ru editing.svg edit
sports league level 1
start time 1970
end time 1980
▼ 0 reference
+ add reference
+ add value
sport league system
Normal rank imaginary league system Arbcom ru editing.svg edit
sports league level 2
start time 1980
end time 1990
▼ 0 reference
+ add reference
+ add value
However, maybe it's better to have two separate properties.--Malore (talk) 15:12, 16 October 2018 (UTC)
Yes, I know - it would be possible. But I think the sports league level (P3983) is a main element to describe a sports league. Otherwise you tell basicly, your league was 4x in the same sport league system, only with different qualifiers.. Fundriver (talk) 17:19, 16 October 2018 (UTC)

Query Lexemes in the Query Service[edit]

Hello all,

Graph of Lexemes derived from L2087

I’m very happy to announce that another important feature for Lexicographical Data has been deployed: the ability to query Lexemes in the Query Service.

Here are a few examples:

The queries are based on the RDF mapping that you can find here. Feel free to help improving the documentation, so people can understand how to build queries out of Lexemes.

Thank you very much to Tpt who’s been doing a huge part of the work by mapping Lexemes in RDF, and Smalyshev (WMF) who made the RDF dumps available and integrated in the Query Service.

Feel free to play with it, bring some of these ideas of queries to life, and let us know if you find any issue or bug. These can be stored as subtasks of this one on Phabricator.

If you have questions about Lexicographical Data in general, feel free to write on the talk page of the project. If you have specific questions about the integration in the Query Service, you can also ping Stas onwiki or on IRC.

Cheers, Lea Lacroix (WMDE) (talk) 08:06, 16 October 2018 (UTC)

Problems with sports league level (P3983)[edit]

MisterSynergy Thierry Caro &beer&love Vanbasten_23 Malore Pictogram voting comment.svg Notified participants of WikiProject Sports

I noted two problems about this property:

@Xaris333, Fundriver, Kooma: as main contributors --- Jura 17:28, 15 October 2018 (UTC)
Like
< Premier League (Q9448) View with Reasonator View with SQID > sport league system search < Q18559 >
sports league level (P3983) View with SQID < 1 >
? Xaris333 (talk) 21:04, 15 October 2018 (UTC)
@Xaris333:Yes, like that.--Malore (talk) 22:31, 15 October 2018 (UTC)
Hello,
well first off: The german description talks about sports in general, i think you should just correct the english description and you are all fine. Also the english description isn't really fitting the name of the Property.
To the suggestion: I wouldn't do this, because in history the the sports league level (P3983) of some specific league has often changed, because of the creation of a new league and therefor it should be possible to place a start- and enddate to the league niveau (even tho this didn't happened yet I think, but I could tell you some league, where it changed and would be appropriate to set a start- and enddate on the property).
Best regards,
Fundriver (talk) 13:35, 16 October 2018 (UTC)
@Fundriver: It's possible to have two different statements that differ only by qualifiers, something like that:
sport league system
Normal rank imaginary league system Arbcom ru editing.svg edit
sports league level 1
start time 1970
end time 1980
▼ 0 reference
+ add reference
+ add value
sport league system
Normal rank imaginary league system Arbcom ru editing.svg edit
sports league level 2
start time 1980
end time 1990
▼ 0 reference
+ add reference
+ add value
However, maybe it's better to have two separate properties.--Malore (talk) 15:12, 16 October 2018 (UTC)
Yes, I know - it would be possible. But I think the sports league level (P3983) is a main element to describe a sports league. Otherwise you tell basicly, your league was 4x in the same sport league system, only with different qualifiers.. Fundriver (talk) 17:19, 16 October 2018 (UTC)
Ok, you're right. I proposed league system property.--Malore (talk) 23:45, 16 October 2018 (UTC)

Number of peoples or animals among a nationality / population / breed[edit]

Hello. Sorry for my english. Is there a property for the number of "objects" among a population (animals or human) ? As an exemple, for French people (Q121842), we don't have any estimation for the number of french people in the world. This could be very useful (and should be allowed for use for animals, like Q588252 also). Thanks. --Tsaag Valren (talk) 09:31, 16 October 2018 (UTC)

@Tsaag Valren: We have quantity (P1114), but I'm not sure it's the right property for this. --Yair rand (talk) 20:50, 16 October 2018 (UTC)

Languages actually used in labels and titles / translations of labels and titles[edit]

While the language of a label or title must be specified, the actual language may differ:

  • Bend It Like Beckham (Q369492) The German title of that British-Indian movie is Kick it like Beckham, which is English (German distributors like the appeal of English as a modern/hip language, but they want to avoid using the kind of words and phrases that a majority of potential ticket buyers may not understand; so, bend bad, kick good).
  • Die Hand Die Verletzt (Q4162624) This episode of an American television show has a German title, probably also because it sounds cool to American ears.
  1. Is there any way in Wikidata to specify the actual language? I would find it interesting to figure how how often and which different languages are being used, and in what countries this is more prevalent.
  2. Is there a way to specify the literal meaning of a title? That television episode title above means The hand that wounds (naturally, the German title of the episode is Satan). Downsides: many translations (N*N for N languages), translations would be mostly original research.
  3. Unrelated Wiki syntax question: Any way to not have this second (numbered) list be an indented part of the second bullet item of the first list?

If these things are not possible yet, do you have an opinion on whether they might be something worth supporting in the future? --109.91.87.44 11:49, 16 October 2018 (UTC)

For literal meanings, see literal translation (P2441). - Nikki (talk) 16:16, 16 October 2018 (UTC)
Consider also applies to jurisdiction (P1001). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:17, 16 October 2018 (UTC)
Re syntax: You can you *#, if there are no intervening line breaks. --Yair rand (talk) 21:21, 16 October 2018 (UTC)

For point #1, the title (P1476) statement usually specifies a language, and you can also add language of work or name (P407). Although, language of work or name (P407) may be hard to interpret in that case, since the original name is in one language but most of the dialogue in another. Then you've also got native label (P1705), since there may be multiple title statements and who knows which would be the original. But it gets confusing. Ghouston (talk) 22:56, 16 October 2018 (UTC)

Thesis data in the Edinburgh Research Archive[edit]

Hi, following this discussion, we are in a position to begin the mass import thesis data from the Edinburgh Research Archive. In a nutshell:

  1. ChaoticReality has developed a utility to import the thesis record metadata from ALMA catalogue entries for the Edinburgh Research Archive’s theses.
  2. Can we create and delete test Wikidata items to see that the QS export is working correctly?
  3. Following this testing phase, this can we place a bot request to then mass import given the size of sample we are talking about from the Edinburgh Research Archive? Stinglehammer (talk) 12:03, 16 October 2018 (UTC)

JSON-LD now on beta[edit]

Hello all,

We’re planning to add JSON-LD as a serialization format for Wikidata. This will allow for example an easier access to RDF data from Javascript.

This is now deployed on https://wikidata.beta.wmflabs.org. Example: https://wikidata.beta.wmflabs.org/wiki/Special:EntityData/Q64.jsonld

As with the other formats we already support (like turtle or rdf/xml), content negotiation is used if the format is not indicated by a suffix like .jsonld. The MIME type that can be used in the Accept header to request JSON-LD output is application/ld+json.

If you’re interested in this feature, please test it, and let us know if you find any issues. The related ticket is this one. If everything goes as planned, we will enable it to wikidata.org on October 31st.

Cheers, Lea Lacroix (WMDE) (talk) 13:05, 16 October 2018 (UTC)

Also pinging Cscott and Multichill who were involved in the previous discussions, and Maxlath who's working a lot with Javascript :) Lea Lacroix (WMDE) (talk) 13:05, 16 October 2018 (UTC)
Is there a description, comparable to mediawikiwiki:Wikibase/DataModel/JSON? Jc3s5h (talk) 13:59, 16 October 2018 (UTC)
I believe that mediawikiwiki:Wikibase/DataModel/ is the appropriate description, since the JSON-LD format is a direct representation of the underlying datamodel, as expressed in the existing turtle/RDFa/n-quads representations, and they don't appear to have separate description subpages. The "old" JSON format is an ad-hoc format with no standardized semantics, which is why it needed its own description. Specs for JSON-LD can be found linked from https://json-ld.org/. See also https://gerrit.wikimedia.org/r/465547 which will add appropriate JSON-LD links from article pages to the JSON-LD format data from wikidata (via content negotiation using the HTTP Accept header). Cscott (talk) 14:49, 16 October 2018 (UTC)