Shortcuts: WD:PC, WD:CHAT, WD:?

Wikidata:Project chat: Difference between revisions

From Wikidata
Jump to navigation Jump to search
Content deleted Content added
Pigsonthewing (talk | contribs)
Syced (talk | contribs)
Tag: 2017 source edit
Line 929: Line 929:
:::Yes, they are searchable with <code>"insource:"</code>. It searches the raw unformatted text. I changed the wording here and the RFD, calling it a link was incorrect, it really is a ''hidden annotation''. [[User:Richard Arthur Norton (1958- )|RAN]] ([[User talk:Richard Arthur Norton (1958- )|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 16:09, 10 May 2018 (UTC)
:::Yes, they are searchable with <code>"insource:"</code>. It searches the raw unformatted text. I changed the wording here and the RFD, calling it a link was incorrect, it really is a ''hidden annotation''. [[User:Richard Arthur Norton (1958- )|RAN]] ([[User talk:Richard Arthur Norton (1958- )|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 16:09, 10 May 2018 (UTC)
:::: It seems to be about both linking and showing the QIDs in the code - the latter is a fallback that some people have been using to preserve the info in the case that the links aren't allowed. Thanks. [[User:Mike Peel|Mike Peel]] ([[User talk:Mike Peel|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 17:05, 10 May 2018 (UTC)
:::: It seems to be about both linking and showing the QIDs in the code - the latter is a fallback that some people have been using to preserve the info in the case that the links aren't allowed. Thanks. [[User:Mike Peel|Mike Peel]] ([[User talk:Mike Peel|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 17:05, 10 May 2018 (UTC)

:As a Wikipedia editor first and Wikidata editor second, I am glad for this tip. I don't follow MOS talks as much as I used to do, so I would have missed this discussion. [[User:Syced|Syced]] ([[User talk:Syced|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 06:32, 11 May 2018 (UTC)


== Request for a bot operator to run a bot ==
== Request for a bot operator to run a bot ==

Revision as of 06:32, 11 May 2018

Wikidata project chat
Place used to discuss any and all aspects of Wikidata: the project itself, policy and proposals, individual data items, technical issues, etc.
Please take a look at the frequently asked questions to see if your question has already been answered.
Please use {{Q}} or {{P}}, the first time you mention an item, or property, respectively.
Requests for deletions can be made here. Merging instructions can be found here.
IRC channel: #wikidataconnect
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/06.

Enwiki RFC raises concerns

There's a discussion going on enwiki related to wikidata at en:Wikipedia:Wikidata/2018 Infobox RfC.

  • Some editors have pointed out that editors on Wikidata can change the sourced info with incorrect one. How does Wikidata implement check on that?
  • Another concern raised is the spamming by humans and bots on Wikidata. How is that kept under check here?

Capankajsmilyo (talk) 04:56, 18 April 2018 (UTC)[reply]

Re monitoring changes to sourced statements: Wikidata_talk:Abuse_filter#Tag_changes_to_sourced_statements. --Yair rand (talk) 06:11, 18 April 2018 (UTC)[reply]
This concern has & will continue to be be raised every 4 months or so on language wikipedias, until watchlists / histories on language wikipedias are able to show changes to the article arising out of changes in wikidata. Is there a Phab ticket for that? fwiw, my experience of screwing up quickstatement runs is that I get an angry wikidata person knocking on my talk page within minutes, so in that respect we work like language wikipedias: people watch items and bark when idiots like me bork them. --Tagishsimon (talk) 12:28, 18 April 2018 (UTC)[reply]
Wikipedia watchlist integration has already improved a lot, it might be worth to test it again. As far as I know it is not yet complete and there are some issues to resolve (e.g. delayed integration sometimes), but the amount and quality of listed edits are meanwhile pretty good. —MisterSynergy (talk) 12:51, 18 April 2018 (UTC)[reply]
I don't know which watchlist are you talking about. The one I just tested was flooded with "linked to this/that language" and "removed from this/that language". That's definitely NOT what enwiki editors want. Capankajsmilyo (talk) 11:41, 19 April 2018 (UTC)[reply]
I talk about the Wikidata integration in the standard watchlist (activated by the “Show Wikidata edits in your watchlist” checkbox at en:Special:Preferences#mw-prefsection-watchlist). On my 8000+ article enwiki watchlist, I see 10 Wikidata edits out of the past 250 edits which are listed. This is a typical load these days at enwiki, and numbers at my (smaller) dewiki watchlist are very similar. If you refer to interwikilinks: whose were shown in the pre-Wikidata time as regular edits as well, and many editors do indeed care for them as they are listed next to the article. —MisterSynergy (talk) 11:59, 19 April 2018 (UTC)[reply]
I didn't actually know watchlist integration was as good as it is (having turned it on). Maybe language wikis need to think about turning wikidata changes on by default? But I still see nothing in page histories. --Tagishsimon (talk) 21:25, 19 April 2018 (UTC)[reply]


Perhaps one remark:
One main concern is that WD allows the modification of a value of a statement without any care about the linked source. If a vandal change a value of sourced statement, then the argument that WD data can be filtered by looking only at statements with source fails. This criticism is from a contributor who develops a particular system to track modifications compared to a referenced version of the article.
So if someone changes a value of a statement, the corresponding source has to be deleted.
Then the usual criticism is that WD will be more vandalized than WPs if all WPs are using data from the source. Snipre (talk) 13:33, 19 April 2018 (UTC)[reply]
@Capankajsmilyo: For your question about spamming, can you explain us how wp:en deal with spamming in their articles ? WP contributors have always higher expectations with WD than the ones they accept for their wp. Snipre (talk) 13:37, 19 April 2018 (UTC)[reply]
Enwiki use techniques like sanctions and block of user or lock on page itself to deal with spammers. See en:Wikipedia:Spam for the guidelines. Capankajsmilyo (talk) 14:46, 19 April 2018 (UTC)[reply]
@Capankajsmilyo: Yup. Same. [1]. Wikidata is not the wild-west some language wiki users think it is. --Tagishsimon (talk) 21:25, 19 April 2018 (UTC)[reply]
@Lea Lacroix (WMDE): Can you provide us some feedback about the technical feasibility of restricting the edition of WD items to registered users only but allowing IP to contribute in talk pages and community pages ? Snipre (talk) 13:40, 19 April 2018 (UTC)[reply]
Not a chance that would be supported. Wikidata is a wiki. --Yair rand (talk) 19:53, 19 April 2018 (UTC)[reply]
@Yair rand: The registration doesn't prevent anyone to edit. We really need to balance data protective measures against contribution freedom. If WD continue to have a negative evaluation among WP contributors, this can prevent more WP contributors to contribute to WD and to use WD data in WP.
Saying that WD is a wiki is not an argument when defining requirements for contribution right: wiki doesn't mean that contributions have to be done under IP. Registration is unique, so no sense to say that requires a lot of time, registration can be anonymous, no need to provide a real identity. Why IP should be able to contribute ? Especially in WD where most of contributions are done by bots or at large scale where identification of contributors is necessary. Snipre (talk) 11:37, 20 April 2018 (UTC)[reply]
Snipre (talk) 11:37, 20 April 2018 (UTC)[reply]
There are ~27.000 IP edits per day in English Wikipedia, and 1900 IP edits per day in Wikidata (both in 2018 according to Wikiscan). Why should we allow the former, but not the latter? —MisterSynergy (talk) 11:45, 20 April 2018 (UTC)[reply]
@MisterSynergy:
  1. Because the amount of IP edits is low in WD compared to the ones of bots and registered users, so the loss won't be critical and we can expected that some IP editors will create an account to continue to contribute.
  2. Data modeling of data plays a critical role in WD and absolute respect of the data format (correct use of properties, addition of the good qualifiers, whole description of sources,...), and for that kind of work, bots are more efficient than individual contributors. IP contributors represent a high risk to add data in the wrong format leading to useless contributions due to lack of knowledge.
  3. One edit in WD can impact several dozens or even more articles in different WPs. It is necessary to have a real possibility to contact the editor when conflicting data are generated especially if the sources are not online.
  4. The registration operation is a time consuming effort for vandals compared to the effort of modifying one statement, this will discourage most vandals as it will require more actions and efforts to be able to do one vandalism.
  5. It will be more easy to track editions of vandals by using the username to revert them and once a contributor is considered as a vandal, the blocking of the account is definitive not like IP blockling due to dynamic IPs. Snipre (talk) 22:13, 20 April 2018 (UTC)[reply]

Technical feasibility is not the problem here. Instead of preventing a part of the editors to contribute, we should better continue improving the quality on Wikidata through tools to check data and sources, to patrol, to give a better overview of the data's quality. That's what the editors and the development team are already working on. Lea Lacroix (WMDE) (talk) 15:02, 21 April 2018 (UTC)[reply]

@Lea Lacroix (WMDE): The problem is that WP contributors doesn't accept as valid arguments "we will do something or we are doing something". They want to see what is currently in place to avoid vandalism in WD widespreading in WPs using WD data. The main problem is that to handle all requirement from WPs in term of data monitoring, the corresponding control in WD is huge.
Expecting that patrol will provide a sufficient answer is an error: what is the ratio modifications/active contributor in WD ? Same for watchlist : what is the ratio items/active contributors ? Then for a regular check between original databases and WD data, only online and open databases can be systematically checked. If someone use a book as reference, there is no way to perform an automatic control.
We were trying to show that WD proposes a set of well sourced data and it was possible to filter/extract that data. Answer from WP users: how do you prevent that someone change a value without modifying the source ?
In some point we have to balance the need of higher restrictions to protect data in order to provide enough guarantee about data quality and the will of offering the possibility to everyone to what's is want with a reject of WD by users. Snipre (talk) 15:07, 24 April 2018 (UTC)[reply]
  • @Snipre: I understand your concerns. But at the same time, as my personal experience, I feel this is not necessarily true, "vandalism ... widespreading in WPs using WD data". I suppose vandal rate varies depending on properties (or genre). At least region I'm editing, vandal rate is extreamly low in WD. I've been checking anatomy related identifier properties some years in WD (e.g. Terminologia Anatomica 98 ID (P1323), Foundational Model of Anatomy ID (P1402)). There are roughly about 10,000 data counts, as total. And data is used in roughly 50 different language Wikipedia editions now. Among those, through some years, apparent vandalism which I know was only once. I can't recall the exact item, but that was sexual organ item. One value was overwritten by text like "my d*ck is big lolol" or something like that. Yes, format constraint light-uped that edit. So that was smoothly reverted. So I think properties (or genre) difference makes huge difference in vandal rate in WD, as same as Wikipeida. --Was a bee (talk) 10:29, 29 April 2018 (UTC)[reply]
  • Although this is not smart style, but a brute-force way, editing with QuickStatements using local "validated excel file" repeatedly is simple answer to keep WD data in perfect quality. Because QuickStatements skips edit order, if exact same data already exists in target page. --Was a bee (talk) 11:12, 29 April 2018 (UTC)[reply]
@Was a bee: Sure, nobody proves until now that vandalism rate is higher in WD than in WP, but when you speak with WP contributors, they will always show you examples of wrong data or vandalism which were not corrected even after a certain time. Yes, some tools exist in WD, especially the constraints system, but who is taking care about that system ? Currently I have the feeling that people create constraints but don't curate the output.
There is a second criticism of WP contributors: we have a solid WD community, but if we calculate the ratio total items number/total active contributors, we have a very low monitoring capacity. That's fact, so we have to show our automatic monitoring systems in action and especially our capacity to treat the output of the control systems like constraint violation reports.
My feeling is that WD is composed mainly of individual contributors and we have very few wikiprojects with enough contributors to monitor some items classes (in my area, we are 2-4 contributors for more that 100'000 items), we struggling to establish a data model and we don't curate data at a sufficient rate to keep the data quality at a good level. The critical element is to have bots or automatic scripts performing control and reporting modeling errors or comparing WD data with external data sets to detect wrong data. Snipre (talk) 07:48, 30 April 2018 (UTC)[reply]
@Snipre: Yes, bot is one good tool for maintenance. If there is task specific auto-run bot, that's good. But at the same time, the basic reason why custom bot is needed for data maintenance in WP is basically that WP doesn't support any basic functionalities which enable maintaining large volume of data. We can not download infobox data from WP. So we can't assess or crosscheck or update the WP infobox data with external sources, without custom bot program in WP. But in WD, situation is different. For example:
  1. All data is downloadable in spreadsheet format (w:Microsoft Excel format) as default functionality. (for example, run this query, and waiting 30 sec or so, CSV file is downloadable from "download" button at middle-right.)
  2. If one had downloaded WD data in Microsoft Excel format, it is not difficult to compare it with other excel file. I personally use basically two Excel functions for that purpose, VLOOKUP and EXACT. If one don't like comparison in local, there is also a tool which enables comparison online (Mix'n'match), after upload excel file to this tool.
  3. Batch editing (for example, hundreds, thousands, or more) from spreadsheet data (w:Microsoft Excel data) is possible by QuickStatements. To use QuickStatements, no need to learn program language like python or C++ or something else. No need to install PHP or Java or something else to own computer. If one has spreadsheet data, only from that, one can edit large volume of data in WD.
Number of properties by data type
From these reasons, as my personal thought, I feel that field-specific automated-bot is not necessarily needed to keep quality in WD (surely it is good if there is, though). My thought is simple. Comparing WP and WD in criteria of quality, quantity and maintainability. Then if WD is better, then use WD. If WP is better, then use WP. In my current feeling, if data type is external identifier, in most cases WD is better than WP in that criteria. As additional info, external identifiers are current WDs' main product. Currently there are roughly 4,500 properties in WD (Wikidata:Database_reports/List_of_properties/all). Among them, about 2,600 are external-identifier. :)
And for identifiers, most important constraint is format constraint. If identifier data violate format constraint, 100% it is wrong data. So this is the most crucial constraint for identifiers. When I see various report pages of identifier data type, it is not rare that "format violation is zero" even if data count is 10,000 or 100,000. I think this fact is one evidence which shows why WD is good :D --Was a bee (talk) 09:11, 3 May 2018 (UTC)[reply]

Limit page creation and edit rate

Hello all,

Following the dispatch lag problems that we encountered over these past months, we decided to set up a limit for the speed of edits and page creation. These limits are now enforced for all accounts (including bots):

  • page creation: max 40 per minute per account
  • edit: max 80 per minute per account

This means that some bots and automated tools have to reduce the speed of their edits in order to not reach these limits.

We will continue monitoring the situation carefully, and see how this has an impact on the projects. If you want to learn more about the reasons for this change, you can have a look at the ticket.

Let us know if you encouter any problem. Lea Lacroix (WMDE) (talk) 08:29, 19 April 2018 (UTC)[reply]

  • Doesn't seem convenient for regular editors. Didn't we have problems mainly due to bots creating new items at steady speed for several days in a row?
    --- Jura 12:31, 21 April 2018 (UTC)[reply]
    • In the past weeks, we have repeatedly seen individual edit rates up to at least 400/min, often ongoing over several hours. Apparently editors can’t overcome the temptation of starting several QuickStatements batches in parallel. (I don’t like the limitation as well, but I have no idea how to otherwise approach the problem.) —MisterSynergy (talk) 13:05, 21 April 2018 (UTC)[reply]
      • It's hard to add a second batch after the first one. Besides this also affects PetScan. With PetScan, the normal thing to do is to start one, prepare another one, start that, etc. Then wait till they all complete. Some might just do 10 or 20 edits at each run, but the peek adds up. Looking at the edits of WMF staff on Special:Log/massmessage, it seems that the peek rate you mention isn't a problem as such. I think Wikidata's dispatcher even groups them together ..
        --- Jura 05:20, 22 April 2018 (UTC)[reply]
        • Would it be feasible to have limits like 5000/hr or 2500/30min, which would be quantitatively very similar to 80/min? This would allow to exceed the limit for shorter times, but in the long run one would not be able to edit faster than ~80/min. —MisterSynergy (talk) 07:01, 22 April 2018 (UTC)[reply]
          • Yeah it looks like the current state isn't what we were trying to achieve so we'll tweak it more. We'll start with an increase of the timespan to 3 minutes and see how that goes. I am hesitant to go to 30 mins or an hour because then we'll again make it easy to do 400 edits in a minute causing all the problems we've seen in the past :/ --Lydia Pintscher (WMDE) (talk) 08:01, 23 April 2018 (UTC)[reply]

"Versand von „Wikidata weekly summary #309“ nach User talk:*Youngjin fehlgeschlagen mit dem Fehlercode ratelimited". Hm, we are blocking our workbase ;) . Regards, Conny (talk) 09:22, 24 April 2018 (UTC).[reply]

  • What's the medium term (3-6 months) plan for the root problem? It seems that various bot operators get requests to slow down even when they stay within the new rate.
    --- Jura 06:23, 30 April 2018 (UTC)[reply]
  • How about tweaking the feature in a way that the edit limit only kicks in if the server is actually overloaded? ChristianKl12:43, 7 May 2018 (UTC)[reply]
    • I was going to suggest that this be lifted for periods of low edit rates (one could imagine some bots just upload 100000 additions in a low traffic period).
      Looking at some of the grafana reports, either I didn't look at the right report or I had to conclude that edits are already somewhat spread out. In the later case, Wikidata might just almost always be at its maximum capacity.
      --- Jura 07:10, 8 May 2018 (UTC)[reply]

Year and Calender Year

Currently, 2015 (Q2002), 2018 (Q25291), 2020 (Q25337) and other year items are instances of year (Q577), but year (Q577) is an instance of unit of time (Q1790144), so I think it is inappropriate for the value of instance of (P31) in year items. Because 2018 (Q25291) is not an unit of time (Q1790144), but may be an instance of time interval (Q186081) or point in time (Q186408), I think each year item should be a subclass of calendar year (Q3186692), and calendar year (Q3186692) should not be a subcalss of year (Q577), but a subclass of time interval (Q186081) or point in time (Q186408).

I drew a plan for the data model, as shown in the figure. Do you have any thoughts?--Okkn (talk) 11:50, 24 April 2018 (UTC)[reply]

  • If you have items for the years (AD/CE) 2016, 2017, 2018, etc., then I think they should be instances of the calendar they are part of and not of Q577.
    The problem with Q577 might be that eventually, you will have an article in sitelinks that covers both (or has some appendix that also covers calendar year) and you will end up debating with some contributor from that wiki about the question if it should be a class or an instance of one or the other, or if there should be a qualifier "except in xzwiki" or "applies to part" "enwiki" (and other things that have nothing to do with the question you are asking).
    --- Jura 12:16, 24 April 2018 (UTC)[reply]
  • I share Jura's concern. In the case of the English Wikipedia, there are two articles, w:Year and w:Calendar year, so it is obvious which article should be linked to which item even though the English Wikipedia article "Year" does briefly describe calendar years. I don't know how this will work out in other languages. Jc3s5h (talk) 13:12, 24 April 2018 (UTC)Ghouston (talk) 05:39, 25 April 2018 (UTC)[reply]
  • I have changed the duration of common year and leap year, expressed in seconds, by adding ±2 seconds. This allows for the possibility of positive or negative leap seconds in June or December. Of course, the chance of a negative leap second or two leap seconds in the same year are remote. The standard defining leap seconds gives preference to June and December but allows a leap second in any month, but I think the chances of 3 leap seconds in 1 year so small we can neglect it.
Also, I question the opposite of (P461) properties on common year and leap year. Yes a Gregorian or Julian calendar year must one or the other of leap year and common year, but they are not opposites in the sense of good is the opposite of evil. Jc3s5h (talk) 17:47, 29 April 2018 (UTC)[reply]
In my opinion, plus-minus signs should not be used subjectively. In this case, duration of year can be (365 days + 1 second), but can't be (365 days - 1 second). The use of "±" is valid? And the reason to choose the value of "2" is also a little subjective.
Common years are sometimes called "non-leap year", so I think they are the opposite of leap years. --Okkn (talk) 15:55, 30 April 2018 (UTC)[reply]
As far as I know, use of the plus-minus sign is the only way to add an uncertainty in the user interface. If you know a way to enter 31,536,000 -0 +2, I would like to learn about it. Also, negative leap seconds are allowed by the standard, although they are not expected to occur. (The second sequence would be 57, 58, 0, 1.) Evaluating uncertainty always involves a subjective judgement about which events are just too outlandish to include. We could write 31,536,000 - 31,536,000 +2, allowing for the end of the world just as the new year begins. Jc3s5h (talk) 17:09, 30 April 2018 (UTC)[reply]
  • @Okkn: @Jura1: I was pleased to see that items like 1 BC (Q25299) have alias indicating their name in astronomical year numbering (1 BC is a.k.a. 0, 2 BC is a.k.a. −1, etc.). This emphasis what period of time the year covered, rather than how the name of the year is written. But the description for year BC (Q29964144) states "any year item that is suffixed by B.C or B.C.E", which puts an emphasis on how the name of the year was written. Perhaps the description should be something like "any year of the Roman, Julian or Gregorian calendar before AD 1."
For the sake of symmetry, I support Jura's suggestion. Jc3s5h (talk) 21:10, 29 April 2018 (UTC)[reply]
  • I woud say a year BC is an instance of both years BC and one of leap year or common year (in the Julian calendar, or just calendar year before AD 8 because leap status is uncertain or not applicable before then). Also, I though we didn't declare an item to be an instance of a superclass where a chain of transitive superclasses would lead to the superclass in question; am I right? Jc3s5h (talk) 17:32, 30 April 2018 (UTC)[reply]
  • As we have seen in this discussion, the superstructure can be quite convoluted and subject to frequent changes. So generally it's easier to figure which class(es) an item should be an instance of and then work from there upwards.
    --- Jura 17:39, 30 April 2018 (UTC)[reply]

How can we document this?

This is an interesting discussion, but how can we make sure that 5, 10, 20 years from now someone can know why it was done this way? Should we consider the creation of a property to link statements to discussions like this one? Or would it be interesting to have annotations for the statements? It is not only about this particular case, there are many occasions that it is not clear why an editor has chosen a particular statement.--Micru (talk) 14:12, 30 April 2018 (UTC)[reply]

I think it's important to have some mention of the documentation on the user interface item page. Nobody ever reads the talk page. Even so, for the time being, we should put links to some documentation on the relevant talk pages. There should also be a link from Help:Dates.
In this case, most of the affected items are all about years and calendars, so a single page in the Help: space (or a relevant Wikiproject) would do; it would be linked to from relevant places. In the more general case, it would be good to have a way to document a rational for a single property within a single item. Jc3s5h (talk) 15:34, 30 April 2018 (UTC)[reply]

Sport ontology

Where can I find ontology description for items related with sports ?. In Wikidata:WikiProject Sports exists the common properties for People, Teams or competitions. However, I can't find information about sports season (Q27020041) or sports season of a sports club (Q1539532). The exemples I found, like 2013–14 Liverpool F.C. season (Q13108085) or 1993–94 Cleveland Cavaliers season (Q282856), are really poor in statements. Thanks, Amadalvarez (talk) 14:12, 28 April 2018 (UTC)[reply]

@Amadalvarez: there is no general sports ontology, unfortunately. Reasons include: (1) types of sports are quite different from each other, thus they may not be covered by the same ontology; (2) lots of items are here due to the sitelinks to Wikipedias; if we were free of them, some things would have been organized differently (i.e. better) than what we can do now; (3) understanding and handling concepts like sports competition (Q13406554), sporting event (Q16510064), sports league (Q623109), sports season (Q27020041) and a couple of others is something which many editors find difficult to do.
If you have specific questions, I can give some advice what to do and what not to do, based on the situation in other items that are instance of sports season of a sports club (Q1539532), and my experience. —MisterSynergy (talk) 06:14, 4 May 2018 (UTC)[reply]
Thanks @MisterSynergy:. I've working in WD full-powered infoboxes in cawiki gathering in a few "core infoboxes" the large amount of them created as long of WP life and I agree with your description. However, in order to avoid repeat the "free style of everyone" here in WD, I rather ask before invent the wheel. After ask here, I let a message with specific questions in User_talk:Xaris333#Sport_ontology. Some have response, some other not yet. So, I invite you to take a glance and give your experience/opinion there, sharing it with Xaris, who has create several sport related items. Thanks, again. Amadalvarez (talk) 11:42, 4 May 2018 (UTC)[reply]
I plan to have a look at that discussion later this day. —MisterSynergy (talk) 14:17, 4 May 2018 (UTC)[reply]

Request for assistance

I'm not quite sure if this is the appropriate place to post this request or not, but I'm writing to request help from a Wikidata administrator or other appropriate authority. (If this isn't the correct place to post this request, my apologies; I'd appreciate it if someone could direct me to the appropriate place. My apologies in advance for this long message, but I need to give context about the problem and the actions I've taken to try to resolve it.)

Someone deleted my User Page (User:47thPennVols) from Wikidata on April 7, 2018, indicating that it was "Out of project scope", but did so without reaching out to me first via my Talk page to provide any guidance to me or to advise me that he/she would be deleting my User Page. I've tried reaching out to this Wikidata user, who is apparently a Wikidata administrator (via that user's talk page), to find out why my User Page was deleted, but have received no response. (I'm not posting the Wikidata administrator's user name here because I'm not upset with that administrator or trying to get him/her in trouble in any way. I just need help getting my User Page restored because I believe that it should not have been deleted.)

As a bit of background, I only discovered the deletion of my User Page while working on a bio of article for the English version of Wikipedia (as part of the April 2018 drive by Women in Red to increase the number of women's biographies on Wikipedia). I had located the correct given name for my bio subject (who was only listed in Wikidata by her nickname), and thought I should add her correct given name to her Wikidata entry (Q24009728) to help other Wikipedians who might be researching her life. That's when I noticed that a Wikidata administrator had deleted my user page roughly three weeks earlier. I've been a member of Wikipedia since 2015 (and have been writing articles on and off since then as User:47thPennVols), and always try to do the right thing but, like many of fellow Wikipedians, am still learning all of the ins and outs of Wikipedia procedures. I hadn't made any changes to any Wikidata entries prior to this year, but began doing so by making minor edits because I had found info that wasn't available on Wikidata, and thought it might be helpful to other Wikipedia researchers. (These minor edits are supported by primary sources which are included in the Wikipedia-English bios I've writte.) I find myself wondering if I accidentally committed some sort of violation of a Wikidata rule, which might have prompted the Wikidata administrator to delete my User Page, but I have no way of knowing because that administrator didn't provide any warning or explanation.

I would appreciate it if a Wikidata administrator could communicate with me directly so that I know why my User Page was deleted and whether or not it's possible for that deletion to be undone. (If it can, then I'd also appreciate it if a Wikidata administrator could revert that deletion.) Thank you in advance for your help. 47thPennVols (talk) 17:00, 28 April 2018 (UTC)[reply]

  • @47thPennVols: I am not the admin who deletes it, but I restored your user page. You should link it to your user page on English Wikipedia, and add babels to it. --Okkn (talk) 18:38, 28 April 2018 (UTC)[reply]
    • @Okkn: Thank you so much for your help. How do I link my Wikidata User Page to my English Wikipedia page? (Although I'm not new to Wikipedia, I'm relatively new to Wikidata, and am still trying to figure out the similarities and differences between the two.) 47thPennVols (talk) 19:15, 28 April 2018 (UTC)[reply]
Just a link in the form 47thPennVols, for instance. --Tagishsimon (talk) 21:05, 28 April 2018 (UTC)[reply]
@Tagishsimon: Thank you so much for your help! (My apologies for the delay in responding. I've had my "nose to the grindstone", reseearching and editing bios for the WikiProject Women in Red over on English Wikipedia, and only just saw your message this morning.) 47thPennVols (talk) 15:55, 4 May 2018 (UTC)[reply]
@ChristianKl: Although I never received a response from Romaine re: why he/she deleted my User Page for Wikidata, @Okkn: was kind enough to restore it for me. (Thank you, again, Okkn.) Kind Regards. 47thPennVols (talk) 15:55, 4 May 2018 (UTC)[reply]
The only reason why I would delete a user page is on request or when it is spam. Looking back in the logs I see that another user marked your user page as out of project scope, and in a quick review of that deletion request I might have found your user page looking relatively similar to the many many spam user pages that have been created or attempts of people to write a Wikiepdia article on the wrong wiki. I realise now that this review I made by mistake, my apologies for that. Romaine (talk) 00:52, 5 May 2018 (UTC)[reply]

Identify face in image

To improve the quality of images added, is it possible to get a list of Qids (humans) whose image doesn't have any face in it? Capankajsmilyo (talk) 03:13, 29 April 2018 (UTC)[reply]

There are plenty of open-source AI libraries which can identify a face in an image with high precision. So we can use it and tag Q items with the wrong images. Can it be applied? Capankajsmilyo (talk) 19:44, 2 May 2018 (UTC)[reply]
You could try to use that to add a qualifier that tags them as pass or flag for review.
--- Jura 21:45, 2 May 2018 (UTC)[reply]
AI is in python and Wikipedia seems to use Lua. How to move ahead? Capankajsmilyo (talk) 04:04, 5 May 2018 (UTC)[reply]

Concepts used in a process

I have a reliable source that says fulling (Q1585730) of cloth uses "moisture, heat, pressure, and friction". I have added <uses> moisture (Q217651), heat (Q44432), pressure (Q39552) and friction (Q82580), but the constraints suggest that each of these concepts should have the inverse statement <used by>. Do we really want heat (Q44432) to have a list of every process in every discipline that uses heat? Please advise before I head down that path. - PKM (talk) 18:58, 30 April 2018 (UTC)[reply]

I don't think the inverse constraint makes sense in this case. ChristianKl14:19, 4 May 2018 (UTC)[reply]
Agreed, they can be considered inverses, but it should not be enforced by constraint. ArthurPSmith (talk) 14:49, 4 May 2018 (UTC)[reply]
Agree to removal of the inverse constraint (Q21510855) from uses (P2283) which was added by Laddo with this edit! I've also put a note at the talk page. --Marsupium (talk) 21:07, 8 May 2018 (UTC)[reply]
Indeed, the inverse constraint is not appropriate. LaddΩ chat ;) 23:05, 8 May 2018 (UTC)[reply]

number of points/goals/set scored

Hello. A league has a number of matches played and number of points/goals/set scored. There is no problem with number of matches played/races/starts (P1350). But there is a constrain with number of points/goals/set scored (P1351). It must be used as qualifier constraint. So, how can I add the information about the total goals scored in a league? Xaris333 (talk) 20:08, 30 April 2018 (UTC)[reply]

Notified participants of WikiProject Sport results I think it should be noticed of this discussion. --Sannita - not just another it.wiki sysop 08:55, 3 May 2018 (UTC)[reply]
@Xaris333: I removed the constraint for now, which was added on 1st of April. If there is a proposal how to deal with the ~1000 direct uses, we might want to re-add the constraint. —MisterSynergy (talk) 05:46, 4 May 2018 (UTC)[reply]

Award received

award received (P166) now should also have the inverse statement winner. As winner have Wikidata property related to sports events (Q28106586) how can this be corrected? I am bringing this matter up here because this will have influence on many items. Pmt (talk) 14:45, 1 May 2018 (UTC)[reply]

generally not. Just imagine trying to add all people who were awarded Legion of Honour (Q163700)! --Hsarrazin (talk) 06:47, 7 May 2018 (UTC)[reply]

Showcasing workflows on video

Since everyone has different ways of working on Wikidata, I was thinking that it would be cool to record some videos of editors showing how they collect data, use various tools, or just edit some items. Would anyone volunteer to be interviewed via Hangouts to show their secret sauce? It doesn't need to be long/short, whatever you feel like sharing is more than enough. Just an experiment :) --Micru (talk) 20:57, 1 May 2018 (UTC)[reply]

At WikidataCon, there were a number of folks who sat down and documented their Wikidata workflow for Jan Dittrich (WMDE). Some of the results results can be seen in the Commons gallery here: commons:Category:Boards_of_WikidataCon_2017. That said, I'd be happy to try out a video recorded session explaining some of the workflows I use. -- Fuzheado (talk) 21:03, 2 May 2018 (UTC)[reply]
@Fuzheado, Micru: I would be happy to take part in such an activity; working with languages I don't speak, I have developed many habits that have made handling items in said languages much easier. Mahir256 (talk) 04:47, 7 May 2018 (UTC)[reply]

Can someone press the button to create an approved new property?

Hi

I'm running an in person project in the next days which requires a new property Wikidata:Property_proposal/Directory_of_Open_Access_Journals_ID (I started it almost 2 months ago) which has been approved and is ready to be created. Could someone who has the magic powers please press the button for me so the property is created? I'm sorry to try and jump the queue but the project is stalled without it.

Thanks very much

--John Cummings (talk) 14:54, 2 May 2018 (UTC)[reply]

✓ Done--Micru (talk) 21:32, 2 May 2018 (UTC)[reply]

John Cummings, Micru, that is controversial, there was some strong opposition. Is it a repetition of ISSN? 77.179.61.171 20:43, 4 May 2018 (UTC)[reply]

As I explained in the discussion, DOAJ uses ISSN numbers as identifiers but they are sometimes not an exact match, e.g where multiple different format versions of the publication exist. It also allows for DOAJ to begin having pages for other kinds of information that are not journals, e.g publishers. --John Cummings (talk) 21:26, 4 May 2018 (UTC)[reply]
The decision appeared to be measured and with precedent, 77. What do you hope to achieve by rehashing it here? --Tagishsimon (talk) 21:33, 4 May 2018 (UTC)[reply]

How should I reference data copied from Commons?

Over time I copied a lot of data from Commons templates. I often used imported from Wikimedia project (P143)Commons Creator page (Q24731821) or similar as a references. This time I will be transfering data from pages that have c:Template:Artwork like c:Category:Large Figure in a Shelter - Henry Moore (LH 652c) or File:Diego Velázquez 050.jpg. I was thinking about referencing such transfers with imported from Wikimedia project (P143)Wikimedia Commons (Q565)reference URL (P854)url statements but I can not get reference URL (P854) to work with QuickStatements as URLs with spaces do not seem to be compatible with QS. Would reference like imported from Wikimedia project (P143)Wikimedia Commons (Q565)image (P18)"file name" or imported from Wikimedia project (P143)Wikimedia Commons (Q565)Commons category (P373)"category name" be OK? Or is there a better way to reference such transfers? --Jarekt (talk) 20:30, 2 May 2018 (UTC)[reply]

  • Wikimedia import URL (P4656) is meant for that.
    --- Jura 21:48, 2 May 2018 (UTC)[reply]
  • There is a big difference between 'source' or 'import location' and 'reference'. In these cases, you're sourcing/importing the info from Commons (it's where the info is currently held), but that's not the reference for that info (it's not where the authoritative information was provided). Can you follow the info back to the original reference and include that where available? Thanks. Mike Peel (talk)
As in majority of statements supported by imported from Wikimedia project (P143) "references" it is usually not clear where the data come from. That was a big issue before Wikidata. I know that actual external references are much more important but imported from Wikimedia project (P143) / Wikimedia import URL (P4656) "references" at least inform you where the data was imported from and can help you dig out external references. User:Jura1, thanks for mentining Wikimedia import URL (P4656). I did not know about it. But I might not be able to use it, as I was unable to get QuickStatements to work with URL type statements passed through URL (see Help:QuickStatements#Running_QuickStatements_through_URL) and that is the mechanism I use to speed up import of individual statements from infoboxes on Commons. That why I was wandering about using image (P18) / Commons category (P373) as reference qualifiers. --Jarekt (talk) 00:22, 3 May 2018 (UTC)[reply]

Q4115189 P31 Q5 S4656 "https://commons.wikimedia.org/wiki/Category:Large_Figure_in_a_Shelter_-_Henry_Moore_(LH_652c)"

The above gives version1 and version2. I know that some combination don't work in the first version of QuickStatements, but do in the new one. If you really need it, maybe a "Wikimedia Commons import file" could be created.
--- Jura 10:55, 4 May 2018 (UTC)[reply]
Jura I finaly think I know what is happening, See my bug report / feature request. I am trying to add ability of c:Module:Artwork to create QS codes to pass some artwork metadata present on Commons but missing on Wikidata, and at the moment I am testing it on rare case where there is an "official" image of an artwork defined on Commons but image (P18) is missing here. At the moment there are a few pages in c:Category:Artworks with Wikidata item: quick statements which can be used to test setting image (P18) through QS but the tool is not able to add reference. I will not deploy the ability to upload other statement untill I figure out some acceptable reference/source I could add to them. In case of c:Module:Creator I used imported from Wikimedia project (P143)Commons Creator page (Q24731821) and the item had Commons Creator page (P1472), similar with c:Module:Institution. In case of c:Module:Artwork I would like to provide a link to the actual page where information was copied from, so I am stuck at the moment. --Jarekt (talk) 15:02, 4 May 2018 (UTC)[reply]
I think the source url is missing quotes (and using the wrong property). I'd also skip the date if the date is today.
--- Jura 15:12, 4 May 2018 (UTC)[reply]

How do I finding special characters in a big table that are messing up a Mix n' Match import?

Hi all

I'm trying to create a Mix n' Match import but it fails because there is at least one special characters or spaces hidden in the 4000 lines somewhere. Does anyone know of any special tricks to find them? I think it could be a cyrillic letter that looks very like a western letter or something similarly not obvious.

Thanks

--John Cummings (talk) 09:08, 3 May 2018 (UTC)[reply]

I would read in the file in some programming language (I like Matlab) canvert characters to integers, and search for integer bigger than some threshold. There might be better ways. Can you post it somewhere in your sandbox? --Jarekt (talk) 11:46, 3 May 2018 (UTC)[reply]
You could try out this one. --Edgars2007 (talk) 14:04, 3 May 2018 (UTC)[reply]
Very handy indeed! Many thanks for the suggestions. The online tool did find a couple of strange characters, so definitely a thread to investigate. I'll compare with some previous imports that definitely worked and hopefully be able to tell whether it's a likely cause of the issue. Cheers again :) NavinoEvans (talk) 21:08, 3 May 2018 (UTC)[reply]

How to find a Wikidata entries by its GPS position?

Specifically, can I open a map and zoom into a certain place and get all Wikidata items displayed that have a proper GPS coordinate?

This would be very useful to identify WD and WP items, because often the name search shows too many item or it does not give any results because of language barriers.

Cheers Ceever (talk) 13:06, 3 May 2018 (UTC)[reply]

Hello, you can try WikiShootMe. It's also connected to Commons, the blue dots you see are the pictures on Commons that have localisation but are not connected to a Wikidata item. Lea Lacroix (WMDE) (talk) 13:59, 3 May 2018 (UTC)[reply]
I think this highlights a missing aspect of Search on wikidata - our search seems to be pretty much a generic wikimedia search (though with some Wikidata-specific enhancements) based on labels and descriptions. What would be *REALLY* useful would be to add a property search where search behavior was modulated by datatype: for geographic coordinates look for things "close" to a specific point; for dates look in a time range; for external id's look for either exact matches or partial (exact) matches; for quantities search by range with unit conversion; for URL's we have the external links search (though that could be better too) etc. Do we have any phabricator tickets looking into making this easier? Yes you can do all the above with SPARQL but... ArthurPSmith (talk) 18:00, 3 May 2018 (UTC)[reply]
For coordinates, we do have these kind of maps (here: items about things around Berlin + 20 km radius). Technically the map could show all coordinates we have in Wikidata, but the Query Service might time out, and the client machine will probably not be able to render it properly due to the large amount of data. —MisterSynergy (talk) 05:51, 4 May 2018 (UTC)[reply]
And, don't forget, the search doesn't even give results for simple text information stored in text fields other than labels and description (e. g. postal address, birth name etc.). That's really very annoying. --Anvilaquarius (talk) 10:19, 7 May 2018 (UTC)[reply]

Sex or gender in the context of athletic teams

While working on adding statements for Women's Division I basketball teams I noticed that someone had added The property "sex or gender" with the value "female".

I actually editing myself but then wondered whether this was appropriate if the team members may have that attribute but the team itself doesn't.

While working on Q21531595, I see that the property is there but it's followed by a potential issue:

Type constraint Help Discuss Entities using the sex or gender property should be instances of one of the following classes (or of one of their subclasses), but Omaha Mavericks women's basketball currently isn't: person animal character that may or may not be fictional abstract being fictional animal character mythical entity

That seems to match my thinking which is that this statement shouldn't be included for athletic teams. Before I go back and remove such entries, I thought I'd double check here to make sure my thinking is correct.--Sphilbrick (talk) 17:38, 3 May 2018 (UTC)[reply]

Thanks--Sphilbrick (talk) 20:36, 3 May 2018 (UTC)[reply]

Some questions about developer (P178)

Since it is a subproperty of creator (P170), developer (P178) should be used only for the software creator, not for the actual developer.

  • Shouldn't be renamed into "software creator" in order to avoid confusion?
  • Shouldn't have an inverse property to indicate what software has been created by a particul person or organization?
  • How to indicate the actual developer if it is different from the software creator? With maintained by (P126)?--Malore (talk) 23:58, 3 May 2018 (UTC)[reply]
I'm not sure what you mean by "only for the software creator, not for the actual developer". Isn't a developer a creator? Is it even supposed to be used for software? It seems like programmer (P943) would be more specific. Ghouston (talk) 00:04, 4 May 2018 (UTC)[reply]
@Ghouston:You're right, programmer (P943) is more appropriate. However, they are ambiguous because:
  • You are suggesting that the "creator" is the person who creates the first version. I'm not sure if that's the right interpretation. If person 1 creates version 1 and person 2 modifies it to create version 2, wouldn't we say that version 2 was jointly created by person 1 and 2? developer (P178) is marked as a subproperty of creator (P170), which would mean that every developer is also a creator. Ghouston (talk) 07:30, 5 May 2018 (UTC)[reply]
@Ghouston: The creator is the person who creates the first version. The person who creates version 2 is the creator of version 2. Furthermore, IMO a programmer can contribute code without creating any version. Finally, I noted that subject items of "developer" are "software developer" and "software house", so what is the difference between "developer" and "programmer"?--Malore (talk) 14:21, 6 May 2018 (UTC)[reply]
Normally I guess there will be a single Wikidata item for software with multiple versions, so it will often have multiple developers aka creators. There are developers of things other than software, like real estate projects, but developer (P178) claims to have subject of software developer (Q183888) and software company (Q1058914), so I suppose real estate developers aren't in scope. I'm not sure if the difference between a "programmer" and a "software developer" is anything other than branding, like the way organizations have come up with numerous job titles for the field over the years. enwiki currently has separate articles for the two, but they are marked as merge candidates. Ghouston (talk) 07:33, 7 May 2018 (UTC)[reply]
Perhaps a more useful distinction would be between a software publisher (which may be an organization or a person) and a software developer / programmer as a person. Ghouston (talk) 07:39, 7 May 2018 (UTC)[reply]
@Ghouston: I think you're right about the distinction. However, I still think that developer and programmer are ambiguous and that the developer of a specific version can't be considered the creator of the software.--Malore (talk) 12:03, 7 May 2018 (UTC)[reply]

Changes to languages spoken, written or signed (P1412) without prior discussion?

Could someone please point out where was it discussed/decided that languages spoken, written or signed (P1412) was to be changed from "languages spoken" (as it was created) to "languages spoken, written or signed" (as it is now)? Andreasm háblame / just talk to me 05:25, 4 May 2018 (UTC)[reply]

number of platform tracks (P1103) seems likely, there are Dutchism existing on property names. --Liuxinyu970226 (talk) 05:28, 4 May 2018 (UTC)[reply]
It has an interesting history: non-native language spoken [2], language spoken [3], languages spoken [4], languages spoken or published [5], languages spoken or written [6], languages spoken, written or signed [7]. The latest version dates from 2016-06-03, so it seems overdue for another change. Ghouston (talk) 10:30, 4 May 2018 (UTC)[reply]
Well, I think someone pointed out that "Ancient Greek" isn't really spoken, but written. Then someone found that sign language (also included) isn't really written nor spoken. So we ended up with the current label.
--- Jura 10:44, 4 May 2018 (UTC)[reply]
I'm confused about whether symbolic and computer languages should be included. If not, can it be changed to "Natural languages spoken, written or signed"? :) Ghouston (talk) 10:52, 4 May 2018 (UTC)[reply]
I changed it from "languages spoken or written" to "languages spoken, written or signed" in this edit, because deaf people do not (generally) "speak" sign languages. I recall discussing it, but I can't find that discussion.  – The preceding unsigned comment was added by Pigsonthewing (talk • contribs) at 16:09, 4 May 2018 (UTC).[reply]
Probaly hereMisterSynergy (talk) 16:21, 4 May 2018 (UTC)[reply]

Adding colors as CMYK, Hex, and more

I would like to add school colors to entries of athletic sports teams. E.g.Albany Great Danes women's basketball (Q29468771) I do see the property for color:

  • color (P462)

But my impression is that takes values such as "purple", and "gold".

While I would like to include those values, many schools specify in addition to those ordinary English words, values such as:

  • Pantone (PMS)
  • CMYK
  • RGB
  • Web/Hex

As an example see this page

I think I can express the RGB property using:

  • sRGB color hex triplet (P465)

But while I see Q values for:

  • Pantone Matching System (Q749816)
  • CMYK color model (Q166432)

I don't see the ability to express those as properties nor do I see any way to express the hex values. Am I missing something?--Sphilbrick (talk) 14:17, 4 May 2018 (UTC)[reply]

I suppose we could use <named as> "Pantone 19-3642" as a modifier for "purple", but perhaps we need a new property "Color specification" as a modifier for "color", with fields "color system" (Q-item) and "color id" (string). I would support such a property. - PKM (talk) 20:38, 4 May 2018 (UTC)[reply]
We have sRGB color hex triplet (P465) to indicate a color in the hexadecimal notation. --Pasleim (talk) 20:42, 4 May 2018 (UTC)[reply]

────────────────────────────────────────────────────────────────────────────────────────────────────Yes, I erred in suggesting that the RGB property could be used to specify the RGB values. In fact, as you notice, it is intended for the hex values.

I'll restate my comment using a specific example. One common color used by athletic teams can be expressed five common ways:

  1. Color name: Purple
  2. PMS: 269
  3. CMYK: 78, 100, 0, 33
  4. RGB: 70, 22, 107
  5. sRGB color hex triplet: #46166b

There are properties for option 1 and option 5. I hoped I had simply overlooked properties for the other three, but if not, what's the next step to request such properties?--Sphilbrick (talk) 13:31, 5 May 2018 (UTC)[reply]

Option 3 and 4 can be computed from option 5 and vice versa. For option 2 you could request a new property here but I'm quite skeptical about the legal use of PMS in a CC0 project. --Pasleim (talk) 11:46, 7 May 2018 (UTC)[reply]
Yes, regarding Pantone, that concern occurred to me as well. Guess I'll put thison ihold a bit and work on something else. Thanks.--Sphilbrick (talk) 17:46, 9 May 2018 (UTC)[reply]

Flood flag

If I have to make about 8500 edits using QuickStatements (8 edits per item), should I request a flood flag? I did so the last time, but I'm not sure now as I saw a few users doing many such edits without any flag. It was a bit annoying because my watchlist was littered with pseudobot edits, but maybe this is typical here and a number of 8k edits is nothing big enough to bother the bureaucrats? Wostr (talk) 17:09, 4 May 2018 (UTC) I made a request (1), but I'll appreciate an info whether one should request flood flag in such situations or not. Wostr (talk) 20:10, 4 May 2018 (UTC)[reply]

Users[who?] mostly don't care and run QuickStatements without the flag. If you are still hesitant, run your batches in background, ie. via QuickStatementsBot. Matěj Suchánek (talk) 16:52, 6 May 2018 (UTC)[reply]

Wikitext highlighting out of beta

#AfricaGap - What % of humans is from Africa? - Politicians from Africa

The Gender gap is one of our most succesful projects. The gap has been closed by more than one percent. I sincerely doubt that 1% of the humans that we know is from Africa. Africa is not my priority, that is Turkey (#WeMissTurkey) and its history. However, I have a small project where I create Listeria lists for the national politicians of African countries.

These lists show how much there is missing. The history of these lists will show the developments of the content. When there are categories for a specific type of politicians, I have included the property "category contains" so they can be queried by bots for additional properties in Wikidata.

The way Listeria list work mean that they can easily be copied to other Wikipedias. They will show the local articles and, the labels will be in the local language. This is a way to ask attention for the #AfricaGap.

Thanks, GerardM (talk) 08:46, 5 May 2018 (UTC)[reply]

Just to note, there is Wikidata:WikiProject every politician seeking to improve, at least, politicians ... not sure if you've come across it, GerardM? --Tagishsimon (talk) 14:47, 5 May 2018 (UTC)[reply]
Yes there is. This project of mine is a way to show what happens for Africa. It is where we are weakest. Thanks, GerardM (talk) 16:02, 5 May 2018 (UTC)[reply]

@GerardM, Tagishsimon: - Creation of items for humans should speed up. WD only has 4.6 mio. How about 10 mio by end of year? 77.180.81.191 17:01, 5 May 2018 (UTC)[reply]

I'd be interested to know what fraction of the humans on Wikidata are politicians, sportspeople or actors. These are the occupations where notability basically only requires employment. Ghouston (talk) 01:07, 6 May 2018 (UTC)[reply]
The tool Denelezh, even if dedicated to gender gap, provides these metrics, based on the property occupation (P106): among 4,255,732 humans in Wikidata, 655,523 are athletes (athlete (Q2066131)), 400,676 are politicians (politician (Q82955)) and 212,460 are actors (actor (Q33999)). — Envlh (talk) 10:41, 6 May 2018 (UTC)[reply]
That's an interesting table, thanks. Ghouston (talk) 11:33, 6 May 2018 (UTC)[reply]
The tool requires a minimum number of people to even recognise the people from a country. For many current African countries they do not reach the threshold. Thanks, GerardM (talk) 12:00, 6 May 2018 (UTC)[reply]
As I mentioned here and on my blog, the Ottoman empire / Turkish history is what I concentrate on. It should be relatively easy to import all national politicians.. <grin> you could say to one of them, you are not notable when you are not on Wikidata </grin> GerardM (talk) 17:17, 5 May 2018 (UTC)[reply]

Merge

Merge en:Category:Israeli Air Force generals (Q8556730) with de:Kategorie:Kommandeur Luftstreitkräfte (Israel) --Isranoahsamue (talk) 21:16, 5 May 2018 (UTC)[reply]

Has been done. --Tagishsimon (talk) 21:50, 5 May 2018 (UTC)[reply]
General != Kommandeur. 80.171.248.200 05:21, 6 May 2018 (UTC)[reply]

Modifiers for "Material used" and similar concepts

In my work with clothing and textiles, I frequently come across sources that say a certain garment or fabric was "traditionally" or "originally" made of wool or silk or linen, but now is also made of other fibers, synthetics, or blends. I'd like a range of qualifiers to record this information that includes "originally", "traditionally", "usually", "often", "occasionally", "sometimes", "latterly", and perhaps "nowadays" (this is similar to refine date (P4241), but for concepts other than dates). I assume there are other domains than textiles that could use a set of modifiers like this (materials used for sports equipment come immediately to mind). Does this seem like a useful set of qualifiers, and what might we call it? - PKM (talk) 21:41, 5 May 2018 (UTC)[reply]

number of item mergers

I just checked and saw I did around 200 item mergers until now (mainly wikiatricles in different languages, some categories etc.). How many items have been merged so far on wikidata and do we have any assessment on how many more mergable items are we missing? DGtal (talk) 07:48, 6 May 2018 (UTC)[reply]

We have around 1.78M redirected items as of today [8], which to my knowledge corresponds to the number of mergers we’ve done. No idea how to estimate the amount of duplicates that require a merge, to be honest. —MisterSynergy (talk) 08:14, 6 May 2018 (UTC)[reply]
Do we know what caused most of the duplicates? DGtal (talk) 09:50, 6 May 2018 (UTC)[reply]
SELECT ?value (COUNT(?item) AS ?count)
WHERE {
  ?item owl:sameAs ?tgt .
  ?tgt wdt:P31 ?value .
}
GROUP BY ?value
ORDER BY DESC(?count)
Try it!
According to this query, the most common redirects involve categories, people, taxon and disambiguation pages. Assuming that redirects correspond with mergers, then these are the most common causes for duplicates. --Shinnin (talk) 10:36, 6 May 2018 (UTC)[reply]

geni.com - read data from Wikidata

What tool can I use in which way, to read the Item name from Wikidata and fill it into this table? Second question, how to add there new data in the Items? Thank you very much! Regards, Conny (talk) 14:16, 6 May 2018 (UTC).[reply]

If there are more than one entries, it should be marked and manually updated. Conny (talk) 14:59, 6 May 2018 (UTC).[reply]

@Conny: You can get the query service to give you wikidata items matching the P2600 values - example report for 5 of the values - and then merge them into the table using a spreadsheet, for instance. Not sure what you mean by "how to add there new data in the Items?". I note there is no context for the table on the P2600 talkpage, so I'm not quite sure what the mission / problem is. (Although Quickstatements is probably the answer for adding data to wikidata once you have the QId.) --Tagishsimon (talk) 15:20, 6 May 2018 (UTC)[reply]
@Tagishsimon: Oh great, thank you. Quickstatements seems my answer to add claims to the Items. Happy now, Conny (talk) 15:36, 6 May 2018 (UTC).[reply]
@Conny: good; ping me if you need a hand with anything. --Tagishsimon (talk) 15:40, 6 May 2018 (UTC)[reply]

@Tagishsimon, Conny: Many easy to find errors in the list you prepared and so it is questionable how it was created. See my edits, more transparent. But that an ID is found on an article about a person does not mean it is the ID of the person described in the article. A possible next step could be to remove lines if the ID is already in Wikidata for the article in question. 92.229.132.140 19:20, 6 May 2018 (UTC)[reply]

Thanks for your work. I did some cleaning (discpages and metapages), if you think it is important - go on :) . Regards, Conny (talk) 19:28, 6 May 2018 (UTC).[reply]
@Conny: - thanks for YOUR work, even if I was not satisfied with the result :-) Look for Kaufhaus in your list - this is not a person. A Mix'n'match catalog would help, maybe loading all IDs found in Wikimedia projects. @Tagishsimon:, could you create one? enwiki also has more than 1000 IDs https://en.wikipedia.org/w/index.php?title=Special:LinkSearch&limit=5000&offset=0&target=http%3A%2F%2F%2A.geni.com . User:Edgars2007 did some querying of the WMF servers for BBLd https://quarry.wmflabs.org/query/7718, maybe this can be done for geni.com too. 78.55.177.69 11:33, 7 May 2018 (UTC)[reply]
There is one. Can later (probably not today) do a scan for all Wikipedias. If you want me to include only those items, that haven't got Geni already, say so. --Edgars2007 (talk) 12:01, 7 May 2018 (UTC)[reply]
Edgars2007, could you collect just the IDs? On which page they appear is less relevant, since a geni.com-person-link can exist on wiki pages not about a single person and on pages of persons related to that person. For dewiki an sql query is not that relevant anymore, they deleted hundreds of geni-links already. Can in quarry different wikis be combined, e.g. enwiki, etwiki, lvwiki, ltwiki, ruwiki, plwiki? 78.55.254.253 13:25, 9 May 2018 (UTC)[reply]

A weird deletion request on Wikidata:Wikidata_in_Wikimedia_projects

Hi all

I just looked at Wikidata:Wikidata_in_Wikimedia_projects and there is a suggestion that it should be deleted because of a translation issue, can someone who knows about those things take a look?

Thanks

--John Cummings (talk) 14:58, 6 May 2018 (UTC)[reply]

@SLV100: if we deleted everything that isn't finished, I'm not sure if much would remain
--- Jura 15:57, 6 May 2018 (UTC)[reply]

Obsolete language code for Belarusian (Taraškievica) edition of Wikipedia

Hello, could you please help me to change in Wikidata the language code for Belarusian (Taraškievica) edition of Wikipedia from obsolete be-x-old to current be-tarask? --Kazimier Lachnovič (talk) 15:44, 6 May 2018 (UTC)[reply]

Do you mean here? It isn't possible for me either (and I think it hasn't been since the site was moved). You can ask at WD:DEV about the plans. Matěj Suchánek (talk) 16:56, 6 May 2018 (UTC)[reply]

Merge 2x Peer Anton von Saß = Peer Anton von Sass

Q52152855 = Q52693212 92.229.132.140 17:52, 6 May 2018 (UTC)[reply]

→ ← Merged --Pasleim (talk) 18:19, 6 May 2018 (UTC)[reply]

Leeuwarden's 225 names

Those of you interested in the names of places, or in Wikidata's use of aliases, may be interested in this article on Leeuwarden's 225 names. How should we reflect them, in\ Wikidata? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:52, 6 May 2018 (UTC)[reply]

Use 225 times native label (P1705) or official name (P1448) with qualifiers start time (P580) and end time (P582). --Pasleim (talk) 08:33, 7 May 2018 (UTC)[reply]

What's the point to having two equivalent properties?

Why there are both radius (P2120) and diameter (P2386) properties, aren't they equivalent?--Malore (talk) 01:09, 7 May 2018 (UTC)[reply]

I agree that it would make sense to delete on of them. ChristianKl08:07, 7 May 2018 (UTC)[reply]
Some measures are traditionally given as radius, others as diameter. Even if technically they are the same, it makes sense to keep both properties. Same as with different units, in theory they could be converted to SI, but we keep them as they are originally listed.--Micru (talk) 08:27, 7 May 2018 (UTC)[reply]
We don't have two different properties for different units and practically for the sake of the query service different units do get converted to SI. ChristianKl11:09, 7 May 2018 (UTC)[reply]
But still we store them as they appear in the source. --Micru (talk) 11:30, 7 May 2018 (UTC)[reply]
If we maintain both, there should be a property constraint that warn if an item has only one of them or if the value of the diameter is not twice the radius.--Malore (talk) 11:42, 7 May 2018 (UTC)[reply]
Malore, "twice the radius" - could there be rounding errors? 78.55.177.69 11:48, 7 May 2018 (UTC)[reply]
Micru, for places (e.g. of birth) the sources often indicate a name, but in WD this is matched to an item that can have several names, i.e. it is not stored as in the source. Sometimes I used "stated as" to indicate the name in the source. Maybe there could be a property "diameter or radius" and the editor has to indicate what of the two it is? But queries that do calculation then also have to work on that. There might be new problems with a merged property. 78.55.177.69 11:48, 7 May 2018 (UTC)[reply]
I think radius and diameter should be kept as two different properties. In graph theory, the radius of a graph is the minimum over all vertices u of the maximum distance from u to any other vertex of the graph. And one speaks of the diameter rather than a diameter (which refers to the line itself). Also For a convex shape in the plane, the diameter is defined to be the largest distance that can be formed between two opposite parallel lines tangent to its boundary. So used in math radius and diameter may have quite different uses. Pmt (talk) 17:40, 7 May 2018 (UTC)[reply]
@Pmt: Actually, radius is defined as "distance between the center and the surface of a circle or sphere" and diameter as "the diameter of a circular or spherical object", so I think that they shouldn't be used in cases such as your examples.--Malore (talk) 12:54, 9 May 2018 (UTC)[reply]

Software blocks fixing - Human item - Latvian name in label copied to several other language-specific labels

I tried to fix, but software blocks me:

Could not save due to an error.
The save has failed.
As an anti-abuse measure, you are limited from performing this action too many times in a short space of time, and you have exceeded this limit. Please try again in a few minutes.

Fixed some on each of the following items, but not all:

Wikidata weekly summary #311

Dataset in JSON

Hello, I was working on an English-Sanskrit translator based on Deep learning and need dataset for that. How can I download the Wikidata dataset of sanskrit words and their English counterparts in JSON format? Capankajsmilyo (talk) 14:18, 7 May 2018 (UTC)[reply]

check out this query. The results can be downloaded in JSON. It currently only returns 1000 results but you might be able to optimize it to get more results. --Pasleim (talk) 11:37, 8 May 2018 (UTC)[reply]

AdvancedSearch

Birgit Müller (WMDE) 14:53, 7 May 2018 (UTC)[reply]

Shape files for countries

Hi all

I recently discovered that the Wikidata Query Service could in theory create heatmaps and other kinds of maps using areas (like the one below) for countries however this is currently not possible because it does no hold the shape files for the countries. To me this is a quite large missing piece for visualisations. Does anyone know where these could be obtained from and what the process would be to import them?

Thanks

--John Cummings (talk) 15:33, 7 May 2018 (UTC)[reply]

  • As I understand it, the big problem is licensing. Shapefiles can now be stored on Commons (in the Data: namespace, with extension .map) -- however, WMF are insisting that only CC0 data is permitted within this namespace.
Very map sources of shapefiles (eg OpenStreetMap, UK Government) require, at the very least, attribution of authorship -- and are therefore excluded from use.
This appears to be an ideological position -- there doesn't seem to be any great technical issue involved, apart from making sure uses (eg WDQS) display attribution and licensing texts. Jheald (talk) 16:29, 7 May 2018 (UTC)[reply]
See mw:Help:Map Data, talk page, and linked phabricator thread. Jheald (talk) 17:08, 7 May 2018 (UTC)[reply]
... which leads to phab:T178210, which contains what seems to be WMF's lead statement on this so far (requested in connection with c:Commons:Deletion_requests/Data_talk:Kuala_Lumpur_Districts.map):
"Will the tabular and map data features support non-CC0 datasets?"
Currently, the tabular and map data features require a license field, that supports SPDX codes to identify the dataset's license. The feature currently supports CC0. In the future, it may support additional Free Licenses, including CC BY-SA or ODbL.
Before additional licenses can be allowed, the Wikimedia projects should (1) support attribution and other obligations contained in the license (such as when displayed in the Graph extension and other consumers of tabular and map data), and (2) provide users with appropriate community guidelines on what material and license is acceptable. This support may require additional feature development that is not currently planned, but open for future open source contributions.
Personally, I'd like to see Commons take a lead on this, and tell WMF that the community *will* accept map files under open licenses, with workarounds to get round the current technical limitations (eg comment lines in the file saying the stated licence is incorrect, the true licence is ...), and tell the WMF that since Commons *will* be accepting these files, it is therefore now a technical priority to make sure open licences that are not CC0 are accurately presented.
But I don't know whether Commons has the chutzpah and self-confidence in itself as a community to go for a stand like that, to force the point. Jheald (talk) 17:48, 7 May 2018 (UTC)[reply]
For reference c:Commons:Village_pump/Proposals/Archive/2017/10#Proposal_to_include_non-CC0_licenses_for_the_Data_namespace is probably the most extensive discussion of this so far on Commons, to date. Jheald (talk) 21:52, 7 May 2018 (UTC)[reply]

Please only CC0. Wikidata is CC0, and using simply shapefiles, especially for countries on a world map, should not be bundled with other licences. KISS. 78.55.177.69 17:53, 7 May 2018 (UTC)[reply]

Why not? I presume you wouldn't object to an SVG file stored on Commons being used as a backdrop, or returned as one of the images by a WDQS query. So why object to Commons providing the same information in a shapefile format? Jheald (talk) 18:15, 7 May 2018 (UTC)[reply]

OK, so the question seems to be where can we find CC0/public domain shape files for countries? (I also asked here) --John Cummings (talk) 18:10, 7 May 2018 (UTC)[reply]

@Jheald: @Bluerasberry:, I think I found some :) Natural Earth Data, it clearly states PD/CC0 on its terms of use page. Does anyone know anything about maps who would be able to see if they are importable and would be able to upload some? I'm very happy to do grunt work but no idea how to do it. --John Cummings (talk) 18:24, 7 May 2018 (UTC)[reply]

Hi, I am a contributor/member of the "NaturalEarth-Wikidata concordances" project. The next version- Natural Earth v4.1 is very close to release, and will contain the "wikidataid" for a lot of tables ( like admin_0_countries,rivers,lakes,airports ) Current Status: https://github.com/nvkelso/natural-earth-vector/pull/249 ; https://github.com/nvkelso/natural-earth-vector/issues/224 ; For example my matching sheets: naturalearth-wikidata-20180208-admin_0_countries ; --ImreSamu (talk) 09:37, 8 May 2018 (UTC)[reply]

@ImreSamu:, fantastic news, if there's anything people can do to help please let us know :). --John Cummings (talk) 09:40, 8 May 2018 (UTC)[reply]

I am not sure if this is the best way. We have Katrographer extension, that use geoshapes straight from OSM (see in use for example in cawiki templates example or commons template infobox wikidata example). There is no need to upload shapes to Commons (current implementation is very clumsy - impossible to use upload form, no categorization, very slow (see my exapmle) etc.), just mark node/way/relation with wikidata tag (currently 527 K nodes, 180 K ways and 282 K relations tagged. What about WD+OSM federated queries?--Jklamo (talk) 09:56, 8 May 2018 (UTC)[reply]

@Jklamo: both are useful in different ways I think, eg your suggestion would not work for areas that do not exist in OSM like historic regions, distribution areas for species or any other non geographical or political area. Your suggestion would be great to have as an option, do you know how we get from where we are now to this being a functional option which works inside the Wikidata Query Service? Thanks John Cummings (talk) 12:34, 9 May 2018 (UTC)[reply]

@Sic19: You have made shapes at User:Sic19#GeoShapes. Do you have any guidance for this conversation? Is there documentation published somewhere? Blue Rasberry (talk) 14:25, 9 May 2018 (UTC)[reply]

The most useful documentation I've found is the Katrographer extension page that Jklamo referred to above, which covers the creation of maps using Commons map data and external OSM data. Very basic licensing information is given on Help:Map Data. I've experimented with both geoshapes stored on Commons and those imported from OSM - there are benefits and drawbacks to both approaches - and I would suggest gaining familiarity with the datatype is necessary before planning to use it at scale. For example, on my userpage there are two very similar looking examples of the National Library of Wales collections - on the left is from Commons and the right is OSM data - I like the option to combine geoshapes on Commons to show the Library's galleries but being able to link from the OSM data to a SPARQL query is nice too. Shame that they can't be combined though. Commons geoshape data is held back by licensing and OSM by the relative lack of Wikidata tagged objects. There is loads of potential to do interesting work with this datatype and it will improve with time I suspect.
I've previously shared a few thoughts about the licensing situation on commons:Data talk:Chepstow Castle.map. One last thing, it is quite easy to produce your own geoshapes at geojson.io and then copy the data to Commons. Simon Cobb (Sic19 ; talk page) 18:27, 9 May 2018 (UTC)[reply]

@John Cummings: A world political boundaries shapefile dataset is available on a cc-0 license here: http://dx.doi.org/10.7488/ds/1789 and can be viewed/converted to geoJSON at mapshaper.org. It looks OK and we can put it on Commons - let me know if you need any help. Simon Cobb (Sic19 ; talk page) 18:59, 9 May 2018 (UTC)[reply]

@Sic19:, thanks, I notice there's also a more detailed repo at http://hdl.handle.net/10672/124, however I can't see any license information. I have no idea how to upload the shape files, if someone can tell me how to do it I'm happy to create a page with a to do list on and make a start. Thanks, --John Cummings (talk) 14:18, 10 May 2018 (UTC)[reply]

It's also cc-0 and really nice quality. The file size is a problem - Commons geoshapes are limited to just over 2,000kb and quite a few individual countries in the high quality dataset are well over that limit. I don't think you'd gain much from the extra detail if the primary use is visualising data at country level - I've made an example in my Sandbox of the Central African Republic from both datasets (red border = high quality; yellow = low) - it is only when you zoom in that the difference becomes obvious. Simon Cobb (Sic19 ; talk page) 18:45, 10 May 2018 (UTC)[reply]

Bot-populating family names?

We currently have a significant number of entries for human (Q5) that don't have family name (P734) (out of the 100,000 current uses of Wikidata infoboxes on Commons on all topics, 27,000 are humans without family names). Hopefully that will improve now that it's higher up in the suggested properties, but they're quite important for sorting people categories on Commons, so I'm wondering if there's a good way to populate those by bot. Maybe by:

  • looking for instance of (P31)=human (Q5) with a value for given name (P735) but not for family name (P734), where the label minus given name (P735) has no spaces after pre/post whitespace-trimming (and equals the label of another property that has a description like 'family name' (might work for simple names, like Ad Wouters (Q15917124), but not for complex ones - and maybe could be expanded to at least handle Western middle names)
  • Do something similar with DEFAULTSORT parameters in Commons categories, since they're mostly of the form "Last, First" (only works for entries that have commons categories - maybe 0.5 million? Might be able to cross-check against the commons category name and the Wikidata labels)
  • Ditto by extracting the information from commons:Template:PeopleByName (but that's only used in ~30k categories)
  • Possibly also auto-creating new 'family name' items where they don't already exist

Any thoughts? Is this worth looking into further, or is it best left to human edits? Thanks. Mike Peel (talk) 23:52, 7 May 2018 (UTC)[reply]

Importing datasets under incompatible licenses

I learned on a Phabricator ticket about at least one dataset, where the uploader (User:Pintoch) states that the imported dataset, RNSR, is not under a CC-0 license. The ticket claims that this is a frequent occurrence (and lists one other example, PubMedCentral, which I haven't digged into detail yet).

I would like to ask the community to discuss this issue and to decide on remedial steps. Here are a few possible suggestions - feel free to add some yourself. I was hoping that we already have policies for all of these in place - I didn't check, I just saw the contributions on Phabricator, and shaked my head. If all of these policies are already in place, then I would hope that we actually enforce them.

Proposal 1: when the proposal for importing a dataset is made on Wikidata:Dataset Imports, we need to have a field that explicitly discusses the license of the dataset to be imported, or explains why this is not needed. No dataset import may be approved without checking the license of the dataset for compatibility.

Proposal 2: contributors who knowingly import data from datasets with an incompatible license should be warned, and if warnings don't help, blocked.

Proposal 3: data imported from datasets with incompatible licenses should obviously be removed.

The whole thing has a hook that is massively problematic: importing a dataset with an incompatible license is not allowed - but referencing a dataset with an incompatible license on a statement, is. So if we get data from dataset A, which has a CC-0 or PD license, and then add references to dataset B, which has a CC-BY-SA or proprietary license, that's totally OK. But we can't import from dataset B directly.

What do people think? --Denny (talk) 01:09, 8 May 2018 (UTC)[reply]

P.S.: there are a number of side discussions that I would like to avoid, in particular about whether certain datasets are licensable at all. E.g. one could argue that the RNSR cannot be licensed by the French government, because it is incidental data that they need to have anyway (similar to timetable data for a public transport organization). These are important discussions, and I would be happy if we as a community decided to test the laws in such cases, as we did for the monkey selfie, but let's keep it simple for now and just assume that if an organization publishes a dataset under a specific license that this license does actually apply. --Denny (talk) 01:09, 8 May 2018 (UTC)[reply]

From an Italian point of view, we decided NOT to bulk import ANY dataset released under CC BY or above, because it is still not clear what might be the consequences of this. This doesn't mean that those dataset cannot be cited as sources, though, since single data cannot be copyrighted.
There is also the more general topic of "Italian institutions are really afraid of CC0, and most of them consider CC0 to be too American-like (sic!), so they prefer CC BY, also because it offers them more legal protection in case of citation", but I don't know if it's the right place, so I'll just leave that there. --Sannita - not just another it.wiki sysop 07:57, 8 May 2018 (UTC)[reply]
« if an organization publishes a dataset under a specific license that this license does actually apply. » then you can *never* import even a single data from a French database as the BY is *always* mandatory in France if you strictly follow the French law. Compatibility is never an easy thing, that is why is fear that your proposal 1 and 3 should be less strong and more pragmatical (I don't want all the names and codes of French cities removed just because they are under copyright of the Code officiel géographique (Q2981593)). Cdlt, VIGNERON (talk) 08:16, 8 May 2018 (UTC)[reply]
I have the feeling that the problem with all three proposals and data imports in general is that there is nobody around who really knows what is legally allowed. Most datasets don't have a compatible license but one can argue that factual data contained in a database is not protected even if the database itself is protected. [9]. To implement any of the above proposals we first need people who can give legal advice. --Pasleim (talk) 08:57, 8 May 2018 (UTC)[reply]
For what concern reusing existing data banks, only those terms of which which legally enable to (re)license them under CC-0 clauses can be used, and CC-by certaintly does not, so it's more careful to not attempt such an import. Are there online resources with official statements for Italian institutions such as the one you mention? Single data probably can't copyrighted in general, but that's not the topic. This discussion pertains to "substantial parts" of data banks, who can be subjects of restricted use based on some various legal grounds. Concerning the patrimonial right in France that point @VIGNERON:, I think that CC-0 do indicate that people weavers rights in an way that is as extensive that law allow, but not more (obviously). So, as far a work is covered by a droit d’auteur (let's say copyright as implemented in France), then attribution is a legal requirement for derivative works (they are also limits and exceptions to that but they don't apply for a wide public project as Wikidata). That doesn't mean that you can't use CC-0 to release a work, but that people that reuse your work are legally required to respect proper attribution. Thus I guess that to make Wikidata really useful for French people when dealing with copyrighted material, Wikidata should always provide a way to retrieve proper attribution to let this cheese eaters (🤣) in position to respect their local law. --Psychoslave (talk) 14:01, 8 May 2018 (UTC)[reply]
I'd suggest Proposal 4: switch Wikidata to a CC BY-SA license. This would solve license incompatibilities in imports and between Wikidata and other Wikimedia projects, fix Wiktionary integration (ie, not having to do it all again from scratch) and ensure that the data we work so hard for does not end in a proprietary silo. Three birds with one stone. NMaia (talk) 10:39, 8 May 2018 (UTC)[reply]
This is not really an option. CC0 is a "one way street". Once you have released your work under a CC0 license you cannot go back and claim some copyright, for example with a CC-BY license. See Who can use CC0 section on the CC0 Faq from Creative Commons. The only realistic option would be to create a Wikidata 2 and build it again from the sources (Wikipedia etc, but not the current Wikidata)... Robevans123 (talk) 12:02, 8 May 2018 (UTC)[reply]
While it's true CC0 data can't go back to being copyrighted, new content can be released under new, more protective terms. NMaia (talk) 12:37, 8 May 2018 (UTC)[reply]
Well, taking the hypothesis of illegally imported dataset into Wikidata, they never were legally made available under CC0. So it's not about going back, it's just about regularizing what is provided in Wikidata, for example by moving this dataset out of Wikidata or changing the license policy (other ideas are welcome, of course). --Psychoslave (talk) 15:16, 8 May 2018 (UTC)[reply]
Is anything imported from Wikimedia projects actually copyrightable? Simple facts are not. Ghouston (talk) 11:15, 8 May 2018 (UTC)[reply]
Wiktionary definitions would be copyrightable and importable if we used a compatible license, for instance. NMaia (talk) 12:37, 8 May 2018 (UTC)[reply]
At the very least it is a really doubtful situation, which is not better when one want to be able to honestly promote Wikidata as a database fully under CC-0 license that is reusable by anyone under the terms of this license without fear of any legal issue. --Psychoslave (talk) 14:11, 8 May 2018 (UTC)[reply]
Ghouston yes, the vast majority of Wikidata data is clearly not-copyrighted facts but maybe not all (I'm thinking of certain string and texts properties, where in some very limited cases one can raise the question of legality). Psychoslave you do know that zero-risk doesn't exist (or just as a bad-decision making bias) and for most of the Wikidata data, the risk of issue is so limited, it can easily be ignored (and AFAIK, there was no issue in 5 years) and if not ignored, it's easy to just take a look at the sources for reassurance. Cdlt, VIGNERON (talk) 08:15, 9 May 2018 (UTC)[reply]
I agree with @VIGNERON: that zero-risk doesn't exist. But on the other hand being under an acceptable threshold of uncertainty is possible and should be targeted. What are the hypotheses and metrics on which you are grounding the conclusion that "the risk of issue is so limited, it can easily be ignored"? I don't share this conclusion, because they are laws which raise high doubts about what can be transferred from one data bank to an other. That we didn't faced any in court judgment in five years is not a proof that we don't break the law. I think we don't need to wait for issues to reach such a critical state to tackle them. For example @Denny: as an official WMDE team member admitted as soon as of 2012 that large extraction from Wikipedia could not be allowed. But today there are around 50M statements in Wikidata which where directly imported from Wikipedia. So ever some convincing arguments were raised that change this official legal point of view, or we are clearly consciously making importations which are considered disallowed. Or it can also be an other possibility that I didn't envisioned. Anyway, it would important to make an official statement about that, is there any official statement @Lydia_Pintscher_(WMDE):? And if this does apply to Wikipedia, how would this not apply to other data banks whose significant parts were extracted and imported in Wikidata (expect when they are clearly under CC-0 compatible terms, of course)? --Psychoslave (talk) 07:25, 10 May 2018 (UTC)[reply]
I think Denny got it wrong back then, or his words weren’t precise enough. Just because some data is included in a CC-BY-SA-licensed work (as Wikipedia) doesn’t mean that this data is copyright-able at all. The license protects creativity and originality, but it does not protect each and every word/data/aspect in the work. This is a thing with other seemingly incompatible works as well: just because someone claims to have copyright for a work doesn’t necessarily mean that the work actually enjoys any copyright protection. So: I don’t understand how you can worry about Wikipedia imports, and I don’t consider them disallowed. —MisterSynergy (talk) 07:48, 10 May 2018 (UTC)[reply]
The problem is not to know if every single data included in Wikipedia is copyrightable as single element, of course they are not. But articles as a whole, and Wikipedia as a whole are copyrighted and licensed. It's substantial extraction of this copyrighted material, licensed under CC-by-sa which is a legal issue. It would not be an issue to provide identical data obtained from miscellaneous other sources, provided that no similar irregular massive extraction would be involved. It would not be an issue to provide this massively extracted and imported data under a CC-by-sa license. Wikipedia instances constitute an original collection of data. If Wikidata don't like the condition under which Wikipedia provide data, the option of getting data from other means compatibles with CC-0 is open. In the end of the day, the claim that Wikidata is under CC-0 is simply not reliable enough for convincing people serious about license issues such as the OSM community. What the point of claiming a CC-0 license if downstream users can't rely that statement? --Psychoslave (talk) 20:31, 10 May 2018 (UTC)[reply]
Do you have a definition what a substantial extraction is? To my knowledge this is not clearly defined (or even completely undefined), just as many other details about database rights have not yet been well contested in court. —MisterSynergy (talk) 21:30, 10 May 2018 (UTC)[reply]
I am not a lawyer and can therefor not make an official statement on legal matters, sorry. However there is m:Wikilegal/Database Rights from the WMF legal team which I rely on. --Lydia Pintscher (WMDE) (talk) 10:31, 10 May 2018 (UTC)[reply]
It's always nice to recall existence of this document for those who didn't read it yet, but it doesn't give an answer to whether Wikidata will continue to officially ignore concerns about legal issues or not. So basically, what is the Wikidata official interpretation of this text of legal "preliminary perspective", and how it indicates what should apply for extraction of substantial part of data banks and their importations in Wikidata? If the product manager for Wikidata is not the person who can answer this, who should we contact, or which defined process should we follow to expose a clear explicit decision on this point? Thank you in advance for any suggestion on this. --Psychoslave (talk) 21:27, 10 May 2018 (UTC)[reply]
On this point I would rather make Proposal 4.1: let contributors fill information of license that apply for each data depending on its source (or a set of licenses chosen by the contributor if this is an original data). Of course only free licenses should be accepted, so this wouldn't make disappear the point of making massive import respecting the license policy, just shift it, but it would already make Wikidata far more flexible and useful than it is currently with its CC-0 only license. Alternatively we might have Proposal 4.2: let the community launch its own Wikidata instances to host whatever free license they want in a relational database under whichever free license fit their need. But it appears to me that at least this last proposal is really out of the focus aimed by this section.--Psychoslave (talk) 15:16, 8 May 2018 (UTC)[reply]

For the sake of precision on Italy: CC-BY-3.0-IT (the non generic version) is fine because it waives sui generis database rights. For other imports, it matters whether there is any copyright (on the individual pieces of data) or database right. On the topic, see m:Wikilegal/Database Rights. --Nemo 15:14, 8 May 2018 (UTC)[reply]

That make we wonder, especially what was additionally said in this section, "does Wikidata provide sufficient information about its data sources to enable reusers ensure they are respecting law in their specific jurisdiction". It's one thing to be sure that Wikidata is not doing anything illegal through what it publishes, it's an other point to give its users all the information (or even some helping tools) to make sure they can legally use the data they are interested in, in their own jurisdiction. --Psychoslave (talk) 15:23, 8 May 2018 (UTC)[reply]

OpenStreetMap

User:Mateusz Konieczny has just added a section to Wikidata:OpenStreetMap, saying "Copying data from OSM to Wikidata is not allowed". In the light of the above, ongoing, discussion, should such a bold statement be made? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:00, 8 May 2018 (UTC)[reply]

What do you mean? What in the ongoing discussion incline you to conclude that this is a bold statement? OSM is licensed under ODBl which is clearly incompatible with CC-0. If data from OSM are imported into Wikidata, then Wikidata must be published under OBDl, according to OSM legal fact. --Psychoslave (talk) 21:36, 8 May 2018 (UTC)[reply]
Well: licenses are not compatible so this it true. It's a bit harsh and maybe it should be rephrased but in fine it's ok (and the same has been written on the Wiki OSM page for ages). Plus, we already have a lot of data similar to OSM from other sources, so why would you want to copy OSM? I use OSM a lot but for comparing that data are consistent on both side, I check in the sources and correct on both sides if there is discrepancies; no need to import nor even copy anything. Cdlt, VIGNERON (talk) 07:59, 9 May 2018 (UTC)[reply]
You assert that what is currently being debated is true. It is clear that there is not agreement on this. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:50, 10 May 2018 (UTC)[reply]
So far you provided no argument beyond stating "basic facts that are not copyrightable". It is true, but you jump directly to "and therefore OSM license may be ignored and OSM data is importable to Wikidata". Note that just because basic facts are not copyrightable it does not mean that something that consists (among other things) from basic facts is also not copyrightable Mateusz Konieczny (talk) 15:28, 10 May 2018 (UTC)[reply]
I made no such jump. Please do not put words in quote marks and attribute them to me, when I never said them. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:21, 10 May 2018 (UTC)[reply]
Do you have any evidence at all that ODBL can be completely ignored in the USA (Wikidata juridistristion)? I am not a lawyer but "as long as you credit OpenStreetMap and its contributors" and "If you alter or build upon our data, you may distribute the result only under the same licence" from https://www.openstreetmap.org/copyright seems kind of incompatible with CC0 Mateusz Konieczny (talk) 08:44, 10 May 2018 (UTC)[reply]
Where have I suggested that ODBL "can be completely ignored in the USA"? Please avoid straw-man arguments. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:50, 10 May 2018 (UTC)[reply]
Either ODBL license of OSM data can be ignored in USA (maybe via "we will import everything in parts and claim that each part in a fact not protected by copyright or similar restriction") or importing OSM data into Wikidata would break copyright and copyright-like restrictions. Mateusz Konieczny (talk) 14:06, 10 May 2018 (UTC)[reply]
False dichotomy. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:20, 10 May 2018 (UTC)[reply]
@Pigsonthewing: can you check is there any evidence that importing from OSM to Wikidata is allowed? I encountered some mentions of automated imports of OSM data (under ODBL) into Wikidata and I plan on tracking down this copyright violations and request cleanup. In case that OSM license can be ignored it would be a waste of time Mateusz Konieczny (talk) 08:46, 10 May 2018 (UTC)[reply]
Wrong question. Do you have any evidence that it is prohibited by Wikidata/ the WMF? By law? The OSM community may assert that it is not allowed, but they are - it is argued above - not in a position to apply such rules to basic facts that are not copyrightable. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:50, 10 May 2018 (UTC)[reply]
as far as I know burden of proof is on people wishing to import something (at least it works this way on Wikimedia Commons, enwiki, plwiki - maybe Wikidata has different approach to copyright and everything is accepted as out of copyright and related restrictions until proved otherwise? Mateusz Konieczny (talk) 14:10, 10 May 2018 (UTC)[reply]
You asserted "Copying data from OSM to Wikidata is not allowed" - I'm asking you to substantiate that assertion. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:22, 10 May 2018 (UTC)[reply]
See https://www.openstreetmap.org/copyright (note: IANAL, as I mentioned there may be a way to argue that ODBL does not apply in USA) Mateusz Konieczny (talk) 15:18, 10 May 2018 (UTC)[reply]
I've already seen it. It does not substantiate your assertion. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:20, 10 May 2018 (UTC)[reply]
See this conclusion of the Wikimedia Foundation’s preliminary perspective on this legal issue: In the absence of a license, copying all or a substantial part of a protected database should be avoided. That is, for the case of Wikidata, the absence of a free license compatible with CC-0 should lead to not import data from a data bank. --Psychoslave (talk) 19:54, 10 May 2018 (UTC)[reply]
Thank you. The statement "Copying data from OSM to Wikidata is not allowed" does not limit itself to "all or a substantial part" of the OSM database. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 04:55, 11 May 2018 (UTC)[reply]
"by Wikidata" AFAIK Wikidata has no page documenting restrictions on what may be imported from a copyright side Mateusz Konieczny (talk) 14:10, 10 May 2018 (UTC)[reply]
So not prohibited by Wikidata, then. And my other question? Who does make the prohibition which you claim exists? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:22, 10 May 2018 (UTC)[reply]
"So not prohibited by Wikidata, then" - rather not documented. At least I assume that at least some data is not importable due to copyright concerns (not documenting it is not changing anything) Mateusz Konieczny (talk) 15:24, 10 May 2018 (UTC)[reply]
"Some data" != "all data". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:20, 10 May 2018 (UTC)[reply]
I also marked Wikidata:Bot_requests#Fetch_coordinates_from_OSM as resolved as such import would require changing OSM license Mateusz Konieczny (talk) 09:38, 10 May 2018 (UTC)[reply]
Again, this is subject to ongoing debate. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:50, 10 May 2018 (UTC)[reply]

And a similar section has just been added at Wikidata:OpenStreetMap#Importing data from OSM. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:22, 10 May 2018 (UTC)[reply]

It is about Wikidata:OpenStreetMap#Importing_data_from_Wikidata_into_OSM, right? In that case it should be rather discussed on OSM (it is not something decidable by Wikidata community) Mateusz Konieczny (talk) 15:21, 10 May 2018 (UTC)[reply]

Was it a good idea to create items for these authors?

I'm interested in creating Wikidata items related to scientific publications in paleontology. Today I made items for each of the authors of a paper that I did not realize had a pre-existing item. This item used some kind of "string" format rather than referring to items for the authors. Was it a bad idea for me to have created those items for the authors or should the current item for the paper be reformatted to refer to them for authorship data? I'm not very experienced here on Wikidata and I was hoping you could offer some guidance on how to handle situations like this and about handling authorship more generally. Abyssal (talk) 01:47, 8 May 2018 (UTC)[reply]

Items like Maria Luísa Morais (Q52782144) are likely to be nominated for deletion for not meeting Wikidata:Notability. Basically they need either a sitelink or "serious and publicly available references". Ghouston (talk) 02:25, 8 May 2018 (UTC)[reply]
I'm not sure if it's possible to save that item or not. They have 33 publications listed at [10], but I can't see any online information could be put into the Wikidata item. If their CV was online somewhere it would give some information, but I'm not sure if a CV counts as a serious reference. Ghouston (talk) 02:42, 8 May 2018 (UTC)[reply]
@Ghouston : The paper referenced in the OP described the genus and species Cardiocorax mukulu is there any reason why its authors couldn't be given Wikispecies pages and the items kept on that ground? Abyssal (talk) 02:47, 8 May 2018 (UTC)[reply]
Yes, if they are in scope for Wikispecies they can have a sitelink, and that's good enough for Wikidata. Ghouston (talk) 02:49, 8 May 2018 (UTC)[reply]
@Ghouston : Now that we've confirmed that these author items deserve to exist, how do we use them in the item on the paper itself? Abyssal (talk) 03:37, 8 May 2018 (UTC)[reply]
Create author (P50) statements and then delete the author name string (P2093) statements. Ghouston (talk) 03:46, 8 May 2018 (UTC)[reply]
@Abyssal: If the authors have ORCID iDs, and their papers DOIs, and the latter are listed on the former's ORCID record(s), then ORCIDator will do that for you. See also m:WikiCite. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:06, 8 May 2018 (UTC)[reply]
@Abyssal: I believe we have generally concluded that authors of scientific articles, if they can be identified as individuals in some fashion (for example with an ORCID identifier, or a sufficiently unique name and affiliation in their published works) do meet wikidata's notability criteria via the structural need purpose (to interlink their authorship records) and you don't need further evidence of notability here. @Daniel Mietchen, Fnielsen, Egon Willighagen: among others have done some work on changing author name strings to author items in our data. ArthurPSmith (talk) 13:45, 8 May 2018 (UTC)[reply]
ORCID is as trustable as IMDb as people can create their ID's themselves. I higly doubt that only some ORCID ID is enough to be notable. Sjoerd de Bruin (talk) 14:30, 8 May 2018 (UTC)[reply]
And I didn't say it was. ORCID combined with authorship of a paper with a wikidata item, however, is sufficient to identify a person uniquely and create an item for them. ArthurPSmith (talk) 17:48, 8 May 2018 (UTC)[reply]
The paper is cited to support a claim in Cardiocorax mukulu (Q20722001). This creates a structural need to create items for A new elasmosaurid from the early Maastrichtian of Angola and the implications of girdle morphology on swimming style in plesiosaurs (Q29037679), its authors, and its publisher. Jc3s5h (talk) 16:50, 8 May 2018 (UTC)[reply]
  • Keep Yes keep making items for authors. Yes add the author property to items with author name strings. This author, Morais, now has more properties. In general I advocate that authors of publications which Wikidata indexes have their own Wikidata items. I think that the quality of ORCID and IMDb and Wikidata content is currently comparable. Misinformation and hoaxes are not common on any of these. Since Wikidata is open, cross checks with other databases will make misinformation more identifiable, but for now I think we should cross import data from databases like this. I wish we could import the entirety of ORCID and IMDb, or at least have these staged for consideration in a Wikibase instance.
A person named as author of any publication with a Wikidata item meets Wikidata:Notability. When any individual is the author of multiple publications in Wikidata then an item for that person becomes very useful. Blue Rasberry (talk) 17:51, 8 May 2018 (UTC)[reply]
There's actually no structural need in this case, because author name string (P2093) allows the authors to be named without creating items for them. I had the impression that the articles in Wikidata are generally here because they are used on a Wikimedia project somewhere; or is it possible to create items for articles at random? Ghouston (talk) 03:15, 9 May 2018 (UTC)[reply]
I don't think "structural need" should be interpreted to mean "there is no way to represent this information at all unless an item is created". I think, rather, if there is a structural need for the information, and creating an item is the normal and preferred method for representing that type of information, then the informationitem should be created. Also, if an author is cited to support one claim, there is a good chance other works by the same author will be cited to support other claims, and having an item will allow the various works to be linked.
In addition, creating an item for the author is called for by Help:Sources. Jc3s5h (talk) 12:06, 9 May 2018 (UTC)[reply]
So, any article can be added provided it's used to source a statement somewhere, and all of its authors can also be created. That brings a lot of scientists, and other writers like journalists, into notability, which doesn't actually bother me. Creating items for the 5154 authors of Combined Measurement of the Higgs Boson Mass in p p Collisions at √s=7 and 8 TeV with the ATLAS and CMS Experiments (Q21558717) would be fun. Ghouston (talk) 12:51, 9 May 2018 (UTC)[reply]
If you have enough information to say that multiple papers are written by the same author there's a structural need for the author item as "author name string" doesn't hold the information that multiple papers are written by the same person. ChristianKl13:04, 9 May 2018 (UTC)[reply]
Yes, it could be useful in this case, assuming they can be reliably linked as the same person, whereas making one for somebody hypothetically named "S. Brown" who we only know works at CERN, would be basically a waste of an item. Ghouston (talk) 13:17, 9 May 2018 (UTC)[reply]
We don't know him or her as just S. Brown of CERN, we know him or her as S. Brown of CERN, the author of Qnnnn. Later, Brown might write another article that explicitly mentions Qnnnn as being his or her work, in which case we would be able to expand on the structure that was started upon citing Qnnnn. Jc3s5h (talk) 16:21, 9 May 2018 (UTC)[reply]

Pinging more than 50 participants in a given WikiProject

WikiProject India has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

So WikiProject India had 82 members before I culled from that list those who lacked more than 50 contributions on Wikidata, those who didn't edit (or edited three times or less) in the year 2018, and those with less than 50 edits in 2018 but none in the last month (bringing the number of participants down to 48 members). This filtration made the ping I added above work properly, but it wouldn't have worked before this filtration. Apparently there is a limit on the number of users one can ping in a single edit, about which Zolo informed me on {{Ping project}}'s talk page, which makes it difficult to call on the members of a large WikiProject. Is there any good way to get around this to make {{Ping project}} work in the event a single WikiProject has more than 50 participants? Mahir256 (talk) 02:10, 8 May 2018 (UTC)[reply]

I don't think it will be possible to mention more than 50 users in one edit, but you may file a request on phabricator if you have some arguments for raising the limit. Maybe you could just split the list into two lists and modify template:ping project in a way that you could call the second list, by e.g. {{ping project|India|2}}Wikidata:WikiProject_India/Participants/2. But this would require to make two signed edits to call every member of the wikiproject, i.e.:
  1. {{ping project|India}} ~~~~
  2. {{ping project|India|2}} ~~~~
Originally the limit was higher I think, but the current number of 50 was introduced to prevent situations where someone accidentally type {{Wikidata:Project Chat}} instead of [[Wikidata:Project Chat]]; spammers are the second reason (I remember that we had similar problem with 'thanks' notification in pl.wiki — dynamic IP spammer was sending a few hundred (per user) thanks notifications), so if it can be achieved in other way, I'd suggest leave the limit like it is. Wostr (talk) 18:01, 9 May 2018 (UTC)[reply]

Draft for the RDF mapping of Wikibase Lexeme

Hello all,

One of the things we're expecting a lot about lexicographical data on Wikidata is the ability to run queries. As previously announced, this will not be available for the first release on May 23rd, but you can already add some ideas of queries.

One of the steps to move forwards with the ability to query the data, is to have a RDF mapping ready. This task has been started by Tpt (thanks!) who created a draft for RDF mapping of Wikibase Lexeme. If you have knowledge on the topic, feel free to have a look and let comments directly on the talk page.

Cheers, Lea Lacroix (WMDE) (talk) 12:56, 8 May 2018 (UTC)[reply]

New feature for the Query Service: check the location of the browser

Hello all,

The Wikidata Query Service now offers the possibility to build queries including your current location. You can use the code [AUTO_COORDINATES] in a query to ask for the location. When running the query, the browser will ask for current location.

For example, here's a query showing the items that are located around you, with markers colored depending on P31:

#defaultView:Map
SELECT ?place ?placeLabel ?image ?coordinate_location ?dist ?instance_of ?instance_ofLabel ?layer WHERE {
  SERVICE wikibase:around {
    ?place wdt:P625 ?location.
    bd:serviceParam wikibase:center "[AUTO_COORDINATES]".
    bd:serviceParam wikibase:radius "1".
    bd:serviceParam wikibase:distance ?dist.
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  OPTIONAL { ?place wdt:P18 ?image. }
  OPTIONAL { ?place wdt:P625 ?coordinate_location. }
  OPTIONAL { ?place wdt:P31 ?layer. }
}
Try it!

Related to this feature, two new improvements have been done on the interface, on any query displayed as a map:

  • When the map is displayed, a "marker" button is included on the left side to show your current location
  • A mini-map is displayed on the top-down corner of the map to show a bigger view of the location

Feel free to test it with your favorite queries and let us know if you encounter any problem. Lea Lacroix (WMDE) (talk) 14:08, 8 May 2018 (UTC)[reply]

@Lea Lacroix (WMDE): The browser did not ask for my location, it just showed the map of Berlin without any prompt.--Micru (talk) 14:18, 8 May 2018 (UTC)[reply]
The browser (Firefox) asked permission to show location. Did you allow to remember your choice earlier for Wikidata from the same browser for "nearby" or something else? --Titodutta (talk) 17:58, 8 May 2018 (UTC)[reply]

👍, --John Cummings (talk) 20:58, 8 May 2018 (UTC)[reply]

I got a location in Berlin too, although I'm in Australia. I agreed to the browser location request, but I don't think my browser knows my location. Ghouston (talk) 03:03, 9 May 2018 (UTC)[reply]

Thanks for your feedback. Indeed, when you don't accept to share location or the browser is not able to detect it, the current set up displays some coordinates in Berlin instead. Lea Lacroix (WMDE) (talk) 06:09, 9 May 2018 (UTC)[reply]

Great new feature! It works for me, but it is slower than my existing queries using BIND(geof:distance(?coord1, ?coord2) AS ?distance) combined with filter(?distance < 1000) where I define coord1 as the center of the circle. I get timeouts modifying your example query when I go beyond 1000 km even though there are fewer than 1000 matching places.--37.201.98.147 11:58, 10 May 2018 (UTC)[reply]

Demande de correction de masse sur les articles ayant "cheval de course" en P31

Bonjour, un nombre assez conséquent d'articles concernant des chevaux ont la donnée "cheval de course" renseignée en P31 ; or cela est une erreur, aucun cheval n'est "de course" par nature, seulement si le propriétaire décide de le faire courir (un "cheval de course" peut n'avoir dans les faits jamais couru, mais simplement eu une carrière de reproducteur). Est-il possible de corriger ce renseignement de propriété en "cheval" tout court avec un bot ? Et d'en créer une nouvelle, propre aux animaux, qui serait l'équivalent de "occupation" (réservé aux humains), mais pour les animaux ? D'avance merci ! --Tsaag Valren (talk) 15:59, 8 May 2018 (UTC)[reply]

@Tsaag Valren: Oui, un bot peut faire le travail ou toi-même si tu veux. Tu peux préparer les données dans une feuille excel selon une structure définie et lancer un script qui 1) efface la déclaration P31:cheval de course,2) crée la déclaration P31: cheval et 3) crée une déclaration occupation: cheval de course. Il s'agit de QuickStatements 2, voir Help:QuickStatements/fr. Toutefois, il faudrait avant de te lancer dans une modification de masse, valider un concept de modélisation des données de chevaux sur Wikidata, car si ton idée n'est pas clairement présentée et acceptée, on peut facilement modifier par le même moyen ton travail. C'est le risque avec WD, de voir des changements à grande échelle de la structure des données grâce à des outils automatisés.
J'ai vu que tu faisais partie du projet Wikidata:WikiProject Equines, je te proposes donc d'ouvrir une section ou une sous-page "Data modelling" et d'y préparer une description du modèle en séparant bien le modèle de la race de celui du cheval pris dans un sens individuel.
A première vue, les éléments sur les races devraient être classifiés en subclass of (P279): horse (Q726) et chaque cheval individuel devrait instance of (P31): une race de cheval.
Exemple: Freiberger (Q673441) devrait avoir la déclaration subclass of (P279): horse (Q726) et Vaillant, un étalon reproducteur de cette race devrait avoir la déclaration instance of (P31): Freiberger (Q673441).
L'idée de mieux préciser les différents rôles que peut avoir un cheval pris dans un sens individuel est intéressante, et on devrait discuter de l'extension de la propriété occupation (P106) à d'autres catégories que les seuls humains. Il suffit de modifier les contraintes d'utilisation de la propriété, après annonce/discussion sur la page de discussion de ladite propriété.
Mais pour faire accepter ce type de modification, il faut présenter un concept clair d'utilisation de la propriété, en clair sur quel type d'éléments cette propriété sera utilisé et lister les principales valeurs associées à cette propriété: course hippique voire utiliser les sous-course genre attelage, courses de haies,..., reproducteur, voire d'autres occupation genre utilisation dans la police montée (je suis sûr que l'on peut trouver des chevaux décorés via leur fonction dans la police), utilisation dans le trait militaire,...
Bref, il faut analyser un peu toutes les situations possibles et connues sur les chevaux et proposer un modèle général capable de stocker ces informations. Une fois ce modèle défini, tu peux lancer toutes les modification que tu veux à grande échelle en justifiant que tes modifications sont le fruit d'une décision prise par le projet Equine.
Voilà mon commentaire. Snipre (talk) 01:24, 9 May 2018 (UTC)[reply]
Pour ce genre de modification, j'utilise PetScan plutôt que QuickStatements (mais peu importe).
Pour la race, cela me semble une mauvaise idée de la mettre en P31 (il me semble qu'il y a déjà eu des discussions sur le sujet, notamment autour des croisés et des sans race clairement définie).
Et oui cette discussion devrait être continuer sur le projet correspondant. PS: sinon il y a Wikidata:Bistro pour les discussions générales en français ;)
Cdlt, VIGNERON (talk) 10:07, 9 May 2018 (UTC)[reply]

Project to map the open movement on Wikidata

Hi all

I'm part of the Mozilla Open Leaders program this year and for the course I'm running a project this week to try and improve coverage of the open movement on Wikidata. I'm hoping that we can capture a lot of knowledge from different parts of the open movement including open data, open source software, open hardware, open science, OERs etc. This will both improve Wikidata and hopefully make Wikidata a more useful tool to map these different communities to help them understand and work together better. We may even get a few new people editing Wikidata.

The main thing I'd like help with is to share this tweet about the project to to get contributions from a wide number of people. You are also very welcome to take part yourselves.

Whilst this a short term project I've created Wikidata:WikiProject Open and User:I JethroBT (WMF), User:NavinoEvans and myself have created a much improved version of the Wikidata:Dataset Imports to help this work happen over the longer term.

Thanks

--John Cummings (talk) 20:56, 8 May 2018 (UTC)[reply]

Inconsistency between redlists

There’s a question for query wizards over at en.wiki at WikiProject Women in Red. Can anyone shed some light there? Thanks. NotARabbit (talk) 02:40, 9 May 2018 (UTC)[reply]

@NotARabbit: Problem identified & sorted - I left a note in the WiR thread. --Tagishsimon (talk) 03:42, 9 May 2018 (UTC)[reply]
@Tagishsimon: Thank you! NotARabbit (talk) 03:49, 9 May 2018 (UTC)[reply]

Thank you for participating in the global Wikimedia survey!

Hello!

I would like to share my deepest gratitude for everyone who responded to the Wikimedia Communities and Contributors Survey. The survey has closed for this year.

The quality of the results has improved because more people responded this year. We are working on analyzing the data already and hope to have something published on meta in a couple months. Be sure to watch Community Engagement Insights for when we publish the reports.

We will also message those individuals who signed up on the Thank you page or sent us an email to receive updates about the report.

Feel free to reach out to me directly at egalvez@wikimedia.org or at my talk page on meta.

Thank you again to everyone for sharing your opinions with us! EGalvez (WMF) (talk) (by way of Johan (WMF) (talk)) 09:04, 9 May 2018 (UTC)[reply]

Thank you for spamming my watchlist three times. Sjoerd de Bruin (talk) 09:50, 9 May 2018 (UTC)[reply]

https://www.wikidata.org/w/index.php?title=Property_talk:P2600&action=history , he is deleting content, switching "URL" to "ULR" ... nasty user 78.55.254.253 13:27, 9 May 2018 (UTC)[reply]

Several complaints about reverts and other edits on the talk page of that user:

Not showing any understanding of the problems:

  • "I use Google Translate.The phrase means "Musician of Iran"."
  • "You are the one who is vandalising by adding wrong descriptions"
  • "It makes no sense for the novice to give advice to the expert because the expert knows the correct"

78.55.254.253 13:37, 9 May 2018 (UTC)[reply]

While the user you cite here seems to have made some mistakes, at least he (David) is consistently editing with the same account. Your IP address (if you are a single person) seems to change by the minute, so if you make mistakes of that sort, it is impossible for anybody to track any pattern. Can you explain why you cannot edit with a regular account? Or at least sign your edits in a way that identifies you as an individual? ArthurPSmith (talk) 13:51, 9 May 2018 (UTC)[reply]
If you don't want IPs on wikidata then shut them down. "seems to change by the minute" - already from the page history alone one can see that is a false claim made by you. And calling repeated section removal "mistake" and not "vandalism" is a mistake in itself. In this version there is no section Mix'n'match, no section Quarry, and the former was linked to in user_talk and causes trouble because some users may have seen the vandalized version. 77.179.86.244 11:16, 10 May 2018 (UTC)[reply]
Interesting that 77.179.86.244 is responding as if they were the same person as 78.55.254.253. That's quite a difference in IP address values, about 8.5 million potential IPv4 addresses separate the two. Can I suggest, if you really cannot or refuse to set up your own account, you keep some wikitext of at least some sort of self-identifying signature or contact so your comments and edits can be identified? How does one notify you of a problem with something you have done? If I comment on the talk page of one of the IP addresses you use, will you see the comment? I noticed you (I assume it was you) mass-editing the formatter URL's of many properties a few days ago, and I was curious at the purpose, but there didn't seem to be any way to ask. While I haven't particularly noticed you making problematic edits, your criticism of others is sometimes harsh and unwarranted, but there's no way to follow up about that with you other than in a public location like this. It's a bit frustrating - all the rest of us are quite identifiable and addressable, but you are not. It has been suggested to shut down IP edits, but I know there's a lot of reluctance here on that and up to now at least it's not something I've been in favor of. But I do wish habitual IP users would do something to be a little more identifiable, or perhaps just more restrained. ArthurPSmith (talk) 19:02, 10 May 2018 (UTC)[reply]
@ArthurPSmith: please have a look at the block log. The user has contacted me at dewiki afterwards, and I had a short discussion in German with them. I have no intention to change or lift the block. —MisterSynergy (talk) 19:08, 10 May 2018 (UTC)[reply]

Creating a list of Q values

I've been trying to populate the 349 NCAA (DI) women's basketball teams with several properties. So far so good (except for main category) but exceedingly boring to do manually. I took a look at Quickstatements and think that may be the way to go, but I don't know how to populate a spreadsheet with the Q-values for each team. Obviously, it can be done manually, but there must be an easier way.

There's a table on this page; the first column contains a link to each Wikipedia article, but I don't know how to use that to automatically populate a spreadsheet with the q values. I'm hoping this is basic, and someone can tell me how to do it (or tell me if my general approach is wrong).--Sphilbrick (talk) 17:56, 9 May 2018 (UTC)[reply]

@Sphilbrick: - this report gives you the QIDs of 339 of the teams. Missing are the QIDs of
all of which are, in the templates, piped links, to redirects. You'll need to visit each of these articles and click thru to wikidata to get the QID. hth. Ping me if you have issues with anyof the above or with quickstatements. --18:49, 9 May 2018 (UTC)
Thanks, I may have some questions, but let me start with that.--Sphilbrick (talk) 19:18, 9 May 2018 (UTC)[reply]

Wikidata linking to Wikipedia redux

An RFC at the English Wikipedia again brings up the issue of deleting any link reference to Wikidata for people that do not have their own entry in the English Wikipedia, but appear in Wikidata: w:Wikipedia_talk:Manual_of_Style#New_RFC_on_linking_to_Wikidata. This RFC wants to remove hidden text with a Q-number links to people that have been deleted the English Wikipedia but still have entries in Wikidata. For example "William D. McDowell<!--Q18808343-->". This will let an editor know that if an article is recreated or a new entry is created for this person, an entry at Wikidata already exists. This will hopefully reduce duplication of Wikidata entries. It also allows Wikipedia to disambiguate people that appear in articles and lists that may never have Wikipedia entries. This way a person can know that say "John Smith, Mayor of Yourtown<!-Q123456-->" in an article on Yourtown is the same person as "John Smith, President of BigCompany<!-Q123456-->" that appears in the article on BigCompany. It will allow someone who creates an article in the future to search for hidden text on the string "Q123456" and find both entries and create the properly disambiguated link. Please add your thoughts at the English Wikipedia RFC no matter which way you feel about the issue. --RAN (talk) 01:52, 10 May 2018 (UTC)[reply]

@Richard Arthur Norton (1958- ): Why is this matter brought up on Wikidata? (and marked With an yellow header). Pmt (talk) 11:37, 10 May 2018 (UTC)[reply]

Because it involves Wikidata, is that not self evident by the use of the word "Wikidata" in the text? It is highlighted because it is not a "one and done" issue. Most issues brought up here only require one person to answer, then the issue is done. Once this is no longer at the bottom of the list, people still need to see it and read it. --RAN (talk) 12:46, 10 May 2018 (UTC)[reply]
Be warned that normally, mentioning enwp RfCs about Wikidata here results in en:WP:CANVAS accusations. Thanks. Mike Peel (talk) 13:40, 10 May 2018 (UTC)[reply]
@Mike Peel: Can you quote me the passage in en:WP:CANVAS you are referring to, and you think I am violating? The word "Wikidata" does not appear in :en:WP:CANVAS and of course it says to add a notice on "the talk page or noticeboard of one or more WikiProjects or other Wikipedia collaborations which may have interest in the topic under discussion [and/or] a central location (such as the Village pump or other relevant noticeboards) for discussions that have a wider impact such as policy or guideline discussions." I believe that is exactly what I did. --RAN (talk) 14:19, 10 May 2018 (UTC)[reply]
@Richard Arthur Norton (1958- ): I'm just saying what tends to happen, see [11] and [12] from the infobox RfC. Thanks. Mike Peel (talk) 15:24, 10 May 2018 (UTC)[reply]
  • Oh, ok. I understand, thanks. I see the infobox wars are also continuing. I am amazed that it was a non issue at Wiki Commons recently. --RAN (talk) 15:27, 10 May 2018 (UTC)[reply]
Removed the highlighting. I find it annoying, and no issue ever required it to be brought into attention.--Micru (talk) 13:53, 10 May 2018 (UTC)[reply]
@Richard Arthur Norton (1958- ) : Following your link I am ending up in english wikipedia, reading a header New RFC on linking to Wikidata and a subheader RFC question Should we ban links to wikidata within the body of an article? In wich way is this inflicting an user on Wikidata? Pmt (talk) 15:36, 10 May 2018 (UTC)[reply]
  • I think you are asking why I am bringing up the topic since it involves English Wikipedia → Wikidata linking (more of interest to English Wikipedia users and less of interest to Wikidata users) and not Wikidata → English Wikipedia linking (more of interest to Wikidata users and less of interest to English Wikipedia users). I also mentioned that "This will hopefully reduce duplication of Wikidata entries", as I have multiple times recreated Wikipedia articles that were not notable at the time of deletion, and made a new Wikidata entry because I did not realize that the a "Jimmy Smith" redlinked in an article was the same person as "Jim A. Smith", who had an entry already. You do not have to respond if the topic does not interest you. About 80% of all topics brought up here have no interest to me, so I ignore them. You can do the same thing, it will save us both lots of time to devote to topics that do matter to us. RAN (talk) 16:01, 10 May 2018 (UTC)[reply]

@Richard Arthur Norton (1958- ): What I am Reading is; Please add your thoughts at the English Wikipedia RFC no matter which way you feel about the issue. and ...why I am bringing up the topic since it involves English Wikipedia → Wikidata linking... I still means that Things specially matters English wikipedia on one issue should not be discussed on Wikidata. So far you have involved 5 users at wikidata using their time. And you are further asking wikidata users to add their thougts to English wikipedia. Are the users of English wikipedia aware of you call for comments here at wikidata? Pmt (talk) 16:56, 10 May 2018 (UTC)[reply]

I can only say it one more time, if you have no interest in this topic, please move on to the next topic. The "5 users" you mentioned can decide how best to use their time themselves. You are spending a lot of time writing about wasted time, which is the definition of irony. I already addressed why it was posted in this venue. Please do not ask me the same question again, the answer will still be the same. Takk skal du ha. --RAN (talk) 17:08, 10 May 2018 (UTC)[reply]
Yes, they are searchable with "insource:". It searches the raw unformatted text. I changed the wording here and the RFD, calling it a link was incorrect, it really is a hidden annotation. RAN (talk) 16:09, 10 May 2018 (UTC)[reply]
It seems to be about both linking and showing the QIDs in the code - the latter is a fallback that some people have been using to preserve the info in the case that the links aren't allowed. Thanks. Mike Peel (talk) 17:05, 10 May 2018 (UTC)[reply]
As a Wikipedia editor first and Wikidata editor second, I am glad for this tip. I don't follow MOS talks as much as I used to do, so I would have missed this discussion. Syced (talk) 06:32, 11 May 2018 (UTC)[reply]

Request for a bot operator to run a bot

Many categories including "National politicians in Africa" include the "category contains" "human" followed by what the category is about. For instance: "Position held" "President of Chad". My request is for a bot operator to run a bot for all the categories marked in this way in every Wikipedia. Thanks, GerardM (talk) 12:08, 10 May 2018 (UTC)[reply]

Wikidata - Data Resolver

Hi,

I am currently diving into the wikidata project and am mainly interested in information on persons. While getting the json data is not a problem at all, parsing the information for further analysis is rather challenging. I am looking for a way to simplify the retrieval of information. For example, i would like to have specific functions like:

gender = person.get_gender()

For testing purposes i have written a simple python base class and deriving from that a more specific person-class: Here is for example the mentioned gender method of the person class:

   def get_gender(self):
       GENDER_DICT = {
           "Q6581097": "male",
           "Q6581072": "female",
           "Q1097630": "intersex",
           "Q1052281": "transgender female",
           "Q2449503": "transgender_male"
       }
       if "P21" in self.json_item["claims"]:
           try:
               id_val = self.json_item["claims"]["P21"][0]["mainsnak"]["datavalue"]["value"]["id"]
               gender = GENDER_DICT[id_val]
               return gender
           except Exception as e:
               print(e)
               return None

Does a library like this (also for other languages than python) exist? I'd rather use and contribute to an existing library than duplicating it.

Thanks and Greetings,

Niklas

Making query that take all events with localisation and page view from wikipedia

Hi, Me and my friend want to do the app that will show events on the map from given period of time. But we struggle with SPRAQL and I came here for help.

We want that query will take all events and it subclass objects that have coordinates. Then get when this event happend or between which dates. Then sort it by page view of each page in wikipedia.

But we can't find a way to add page views to the results. Also we are not sure that this query get all events.

Now we have https://query.wikidata.org/#PREFIX%20xsd%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0A%0A%23%20limit%20to%2010%20results%20so%20we%20don%27t%20timeout%0ASELECT%20%3Fevent%20%3FeventLabel%20%3Fdate%20%3Fcoordinate_location%20WHERE%20%7B%0A%20%20%3Fevent%20%28wdt%3AP31%2Fwdt%3AP279%2a%29%20wd%3AQ1656682.%0A%20%20OPTIONAL%20%7B%20%3Fevent%20wdt%3AP585%20%3Fdate.%20%7D%0A%20%20OPTIONAL%20%7B%20%3Fevent%20wdt%3AP580%20%3Fdate.%20%7D%0A%20%20BIND%28%28NOW%28%29%29%20-%20%3Fdate%20AS%20%3Fdistance%29%0A%20%20OPTIONAL%20%7B%0A%20%20%20%20%3Fevent%20rdfs%3Alabel%20%3FeventLabel.%0A%20%20%20%20FILTER%28%28LANG%28%3FeventLabel%29%29%20%3D%20%22en%22%29%0A%20%20%7D%0A%20%20FILTER%28%28BOUND%28%3Fdate%29%29%20%26%26%20%28%28DATATYPE%28%3Fdate%29%29%20%3D%20xsd%3AdateTime%29%29%0A%20%20FILTER%28%280%20%3C%3D%20%3Fdistance%29%20%26%26%20%28%3Fdistance%20%3C%20365%29%29%0A%20%20%20%7B%20%3Fevent%20wdt%3AP625%20%3Fcoordinate_location.%20%7D%0A%7D%0ALIMIT%2010

I only found wikimedia API that answer question about page views. What we are missing here?

Thanks for help in advance. Wojtek, Poland

Is possible to mark a statement as problematic?

I often see statements that are probably wrong but I haven't time or competences to correct them. Is there a way to manually mark them as problematic?--Malore (talk) 01:23, 11 May 2018 (UTC)[reply]

Depreciate with a reason for depreciation. Ghouston (talk) 03:43, 11 May 2018 (UTC)[reply]