Wikidata:Contact the development team

From Wikidata
Jump to: navigation, search
Shortcut: WD:DEV
Wikidata development is ongoing. You can leave notes for the development team here, on #wikidata connect and on the mailing list or report bugs on Bugzilla. (See the list of bugs on Bugzilla.)

Regarding the accounts of the Wikidata development team, we have decided on the following rules:

  • Wikidata developers can have clearly marked staff accounts (in the form "Fullname (WMDE)"), and these can receive admin and bureaucrat rights.
  • These staff accounts should be used only for development, testing, spam-fighting, and emergencies.
  • The private accounts of staff members do not get admin and bureaucrat rights by default. If staff members desire admin and bureaucrat rights for their private accounts, those should be gained going through the processes developed by the community.
  • Every staff member is free to use their private account just as everyone else, obviously. Especially if they want to work on content in Wikidata, this is the account they should be using, not their staff account.


On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at August.


ISO-format date: Precision parameter[edit]

An edit, changing precision value to 7 (century), makes the date 1815-08-15 look as 18. century instead of 19.century. How to fix it? Sealle (talk) 06:35, 18 August 2014 (UTC)

there really is a problem with dates, as a date for "20. century" shows as 2000-01-01 - it should be 1900-01-01 (or 19.. as used in most databases, including commons dates). The very strange result is that a person can be married in 1985 and born in 2000-01-01 !! - even if "precision" should make it read as 20.century, there still is a problem... :D --Hsarrazin (talk) 13:43, 20 August 2014 (UTC)
  • Just a little amendment: the beginning of the 20th century is 1901-01-01, not 1900-01-01. Sealle (talk) 14:20, 20 August 2014 (UTC)
Sealle, I agree - 1901 is the REAL first year of the century :) , but it does not make it right to use the last year of the century to store the century value… if it could be stored with 19.. or 1901, or better, the approximate value and precision having it displayed as century, but without loosing the approximate value : sometimes we know the decade… but only have year or century as precision argument :S
I think it's VIAF that uses the following coding : 1800-1999, to signify "19-20th century" - i.e. from begining to 19th to end of 20th - or 1800-1899 when birth AND death are within the 19th century, without precision... - maybe it is not the "real" first year of the century, but, at least, it's clear, and birth is always < other events, and death > other life events… which is the main point ;)
what I mean is… perhaps we should use the smaller value for "begining dates" and the bigger for "ending dates"… this way, dates could be compared logically… --Hsarrazin (talk) 21:49, 20 August 2014 (UTC)

I wonder, is anybody going to fix an obvious error?! Sealle (talk) 21:16, 22 August 2014 (UTC)

Redirect[edit]

Maybe autodescription of redirect have some problem? Reindirizzamento a $4: Q3760925, Q1510227 (Example) --ValterVB (talk) 07:40, 20 August 2014 (UTC)

Jut for info {{Q}} return Script error if Item is a redirect (ex. invalid ID (Q3760925)) --ValterVB (talk) 07:56, 20 August 2014 (UTC)
is redirect operational yet, or is it just in test ? how do you use it ? is it automatic with merge.js ? --Hsarrazin (talk) 13:49, 20 August 2014 (UTC)
It's operational, available only with API (need a BOT) made with api.php?action=wbcreateredirect&... --ValterVB (talk) 13:53, 20 August 2014 (UTC)
the merge gadget does not use API ? or is it just a matter of time for this gadget to be adapted ? :) --Hsarrazin (talk) 21:50, 20 August 2014 (UTC)
Need only some time. --ValterVB (talk) 21:57, 20 August 2014 (UTC)

Delete a badge[edit]

How is possible delete a badge? --ValterVB (talk) 09:46, 21 August 2014 (UTC)

Deselect it and save via the special page. Deselecting can be done with "CTRL + left mouse click". --Lydia Pintscher (WMDE) (talk) 10:09, 21 August 2014 (UTC)

Two new budges[edit]

I have created 2 new budges featured list (Q17506997) and Did you know article (Q17507019), how to support them? Maybe we can add arbitrary budges in the future.--GZWDer (talk) 10:13, 21 August 2014 (UTC)

I'd like some kind of community ok for them to be created. And then we can just add them to the config. --Lydia Pintscher (WMDE) (talk) 10:18, 21 August 2014 (UTC)
Having a badge for featured lists should basically not be critical, as it's just another kind of featured articles. Many Wikipedias do have three of those (featured articles, good articles and featured lists), and display this status in the interwiki links list, so we should have it in Wikidata instead of keeping this one interwiki information locally in the articles. I'm not sure whether there are some conflicts with different kinds of featured lists (compare Category:Featured lists (Q5873672) and Category:Wikipedia featured lists (Q8101833)), though.
For "Did you know", I don't see a benefit of such a badge. Is there any Wikimedia project that displays this information in its interwiki links? Is there any benefit for readers, authors or tools to know that a certain article language once was linked in a certain section of the main page? --YMS (talk) 13:48, 21 August 2014 (UTC)
Portuguese Wikipedia is considering the removal of the "Anexo" namespace: pt:Wikipédia:Esplanada/propostas/Eliminação do domínio Anexo (26abr2014).Helder 17:26, 21 August 2014 (UTC)
I created recommended article (Q17559452) which is used in fi-wiki, da-wiki, se-wiki and sv-wiki. At least in fi-wiki it's the third highest quality of acrticles after featured and good articles. --Stryn (talk) 16:03, 21 August 2014 (UTC)
So what about w:Wikipedia:WikiProject Council/Assessment FAQ? Most of articles on enwiki assessed based on its quality (not sure about other wikis), showing that seems to me a lot mre interesting than DIY usage. --Jklamo (talk) 17:04, 21 August 2014 (UTC)

Two little annoyances[edit]

First, the gadget MediaWiki:Gadget-AuthorityControl.js is again not working normally using firefox 31, it works only on debug mode. Browser console output looks like this:

And the second one is regarding the suggester, when I try to middle-click on one of the suggestions, it doesn't open in a new tab. This feature seems to work only in the search box, but not on the input value box. Not tragic, but it would be nice to have it working there too :)--Micru (talk) 12:54, 21 August 2014 (UTC)

Hey :) We're looking into the issues with teh gadget right now. For the suggester: Can you open a bug report please on bugs.wikimedia.org? Thanks! --Lydia Pintscher (WMDE) (talk) 13:55, 21 August 2014 (UTC)
Done! Bugzilla69908.--Micru (talk) 16:14, 22 August 2014 (UTC)

MonolingualTextValue[edit]

Could somebody explain, why labels and descriptions serialize as {language, value}, while monolingualtext snak values serialize as {text, language}? How can they serialize differently? --JulesWinnfield-hu (talk) 19:26, 21 August 2014 (UTC)

Can we get an explanation? Is this how it will be forever? --JulesWinnfield-hu (talk) 10:20, 27 August 2014 (UTC)

Indeed a strange idea at first sight. But in a way, I can see some sense in it regarding monolingualtext. The actual (data) value is the combination of the text (string) and the language. So, to have (data)value: {value, language} would be confusing. It would probably make sense to use the same key (text), for labels, descriptions and aliases. But, maybe, there are plans that, as soon as multilingualtext is implemented, labels, descriptions and aliases will return multilingualtext values? That would probably resolve the situation. Random knowledge donator (talk) 11:17, 27 August 2014 (UTC)
These two things – labels, descriptions and aliases (also referred to as "fingerprint") on one side, mono- and multilingual text values on the other side – do not have anything in common and do not share any code. There are no plans to replace one with the other. I can see that the two concepts can be confused. But it's really important to look at them independently and create independent implementations for them. The difference in the serialization is partly unintentional and intentional. It just does not matter if they are the same or not because they do not and should not have anything in common. The different serialization makes this clear. --Lydia Pintscher (WMDE) (talk) 15:38, 27 August 2014 (UTC)

Allow setting different WikiProject for budges[edit]

In Wikidata:Project chat#Using badges for article quality and importance, it's requested to store related WikiProject for article quality and importance.--GZWDer (talk) 05:25, 25 August 2014 (UTC)

If there is consensus to create them I am happy to do that. --Lydia Pintscher (WMDE) (talk) 06:43, 25 August 2014 (UTC)

Wikidata broken by design?[edit]

Excuse me for re-activating the following discussion since @Jeblad: made a quite impressive statement that illuminates the problems in a more technical fashion. I did not check back earlier since I am sort of disappointed by by the fact that this fundamental topic is not regarded as important as it should be. Random knowledge donator (talk) 09:56, 25 August 2014 (UTC)


Trying to get some answer here since the project chat discussion about how to properly capture uncertainty did not result in any valuable input.
I am still not sure how to properly capture those two cases of uncertainty I listed. An answer that Wikidata does and will not support that is fair enough - although, as far as I know, the intention of Wikidata was not to be a plain fact database but indeed allow modeling uncertainty. Listing the original description of the issues - any answer appreciated (please excuse the catchy headline, just trying to get some more attention than in the project chat):
The first one was on Abraham von Freising (Q330885): According to the reference, the person may have died either on 7 June 993 or 7 June 994. This could be reflected by using a time range or a data type specific qualifier like "alternative date". But, actually, these are two discrete values, basically a list of dates. Eventually, I added both which, at first, seems reasonable and was done before. However, when querying for people having died in 993, one would receive Abraham von Freising (Q330885) without any hint that this information is not certain. Consequently, when querying for people having died in 993, one would assume that this person, in fact, died in 993 and uncertainty becomes fact.
Another example is Wolfgang Carl Briegel (Q1523127). According to one reference, the person may have been a student of Johann Erasmus Kindermann (Q466635). Qualified by the same time range, I added "unknown value" and Johann Erasmus Kindermann (Q466635) for student of (P1066). However, split into two separate statements, that does not really reflect what the reference expresses and applying both statements, backed by the same reference, seems even odder than backing different values for date of death (P570) with the same reference. Expressing that Johann Erasmus Kindermann (Q466635) may have taught Wolfgang Carl Briegel (Q1523127) using student (P802) on Johann Erasmus Kindermann (Q466635) seems kind of impossible without some weird qualifier expressing "may be false". One could argue to just drop that uncertain information and use "unknown value" exclusively, but, well, that would be a loss of information and I am sure such problems occur in other situations as well (an example of a more prominent topic may be to model something like "Roger Godberd (Q7358238) might have been Robin Hood (Q122634)"). Random knowledge donator (talk) 06:56, 25 June 2014 (UTC)

I think those are different cases and in each case it should be treated differently. For instance, for the case of the date of birth, I would mark it as "unknown value" with qualifiers earliest date (P1319)/latest date (P1326). For the second case you could propose a qualifier "source certainty" that would indicate how sure are the sources about the provided information.
But you shouldn't expect to get "ultimate answers". Anyone can give suggestions, and if you don't get feedback, that means that you can come up with a proposal of your own.
OTOH, I agree that Wikidata is broken by design, however that applies not only Wikidata but to any piece of software or reality-representation :) The trick is to move closer little by little every day and not to expect perfect data or knowledge, because by definition it doesn't exist. --Micru (talk) 08:30, 25 June 2014 (UTC)
Thanks for your answer. Using earliest date (P1319) and latest date (P1326) would imply a range though. "Source certainty" is a nice idea. However, one would need to define a constraint of exclusive values (which probably would need to be items to be machine-readable) and what would these values be? Items for "high certainty", "normal certainty", "low certainty"?
Technically, I would like to simply flag values that can be regarded uncertain. When issuing a query, these values could be marked/filtered/whatever easily. As for the first example, it would even be better to allow some kind of alternative values on single statements since the value is basically a list of possible values - but that is probably hard to model from a technical perspective. Flagging statements uncertain could, for example, be simply(?) achieved by extending the "value type" options though: "custom value" (as opposed to "no value" and "unknown value") would be split into something like "certain value" (default) and "uncertain value". In my opinion, the amount of uncertainty ("source certainty") should be left to the reference/content of the reference since capturing that is out of scope for Wikidata as it involves subjective rating. Random knowledge donator (talk) 14:07, 28 June 2014 (UTC)
Seems like my inquiry was not successful once again. Still, I think this is a fundamental problem. I do not demand that the issue has to be solved right now but it needs to be addressed. However, the only outcome of my question is that no one really cares. I refrain from editing data as long as there is no strategy to resolve such a fundamental issue. Random knowledge donator (talk) 08:46, 2 July 2014 (UTC)
Random knowledge donator, how do you expect it to be successful if you don't file a property proposal with whatever property you think it could help you model uncertainty? I agree that it would be nice to have a confidence option for sources, but I am not the one setting the priorities, and I also think that for now we can do that with a property or a qualifier, so we can learn about the needs and possible uses.--Micru (talk) 09:31, 2 July 2014 (UTC)
Repeating myself: Personally, I do not think a property is appropriate. I would be fine if someone would explain how a property would solve the issue. Random knowledge donator (talk) 09:35, 2 July 2014 (UTC)
Random knowledge donator, if we create a new property it could be used as a qualifier: [qualifiers] expand on, annotate, or contextualize beyond just a property-value pair. It is not the same saying "date of birth:1850" than "date of birth:1850" with qualifier "source certainty:low". Both statement and qualifier form a whole and the statement is incorrect if you don't take both into account.--Micru (talk) 10:48, 2 July 2014 (UTC)
I really appreciate your answers and understand your argumentation. However, having a "source certainty" property involves subjectivity by rating the amount of certainty of a source or the fact stated by the source (which even are two different things but that is more of a different story). And which values would be allowed for "source certainty"? Low, normal, high, very low? Ultimately, I would not support having such subjectivity in Wikidata. How is one supposed to rate the certainty of a reference anyway? That is a very scientific matter. In my opinion, the amount of uncertainty should not be subject of Wikidata - however, having a qualifier like "is uncertain" pointing to a boolean "true" seems pointless as well. Random knowledge donator (talk) 11:42, 3 July 2014 (UTC)
Random knowledge donator, when there is source uncertainty it happens mainly because of two reasons: either the source is stating their self-assessed level of uncertainty, or the circumstances do not allow to consider properly the information contained in the source (physical support degradation, obsolete methodology, wrong assumptions, etc). You could generalize both cases with a general "sourcing circumstances" qualifier with objective values like: significant self-assessed uncertainty, incomplete source, source ambiguity, etc. To model information that is disputed by other sources we already have statement disputed by (P1310).--Micru (talk) 08:01, 4 July 2014 (UTC)
OK, I get the point. Still, I have concerns though. Sorry! First off, a generic property is not really usable since users need to figure out upfront that (a) the property exists, (b) it is the one they are actually looking for and (c) what values are supposed to be used for the property. The concept is just really hard to understand resulting in the property not being used at all. And what data type would the values of "sourcing circumstances" have? Are these supposed to be items, individual text or something else? Apart from that, one needs to be aware of the properties ("source uncertainty", "disputed by" and whatever is there and there to come) that mark uncertainty when querying to be able to filter those values eventually. And in the end, still, I think it involves too much subjectivity and detail. How can I judge that a source is incomplete, outdated or whatever? Yes, there are those really obvious matters like the flat earth theory - however, there are sources with much more subtle issues and the reason why a source may be regarded uncertain can be of diverse scientific matters and I would probably not put my head above the parapet and ascertain a reason why a reference may be regarded uncertain. Instead, I would recommend having a look at the original reference to the reader. Even more, in a secondary source, the reason why something is uncertain may not be supplied at all, like for the two dates of the example in the initial post. I am afraid, the concept of using one or more properties to mark uncertainty, still, seems too complex and - please, excuse me - naive. However, I think the two of us are not getting towards a solution here... what about the developers anyway? Random knowledge donator (talk) 07:22, 8 July 2014 (UTC)
I would like to see something done with qualifiers first to see how it is being used. We can then decide about what to do next and if it is worth investing more time into and if it is worth complicating the user interface and data model for it. --Lydia Pintscher (WMDE) (talk) 09:21, 8 July 2014 (UTC)
A rank "uncertain" would be nice, but I do not know which property could represent that... suggestions welcome, Random knowledge donator. --Micru (talk) 19:28, 11 July 2014 (UTC)
No offense, but waiting for "something being done with qualifiers" is not really helpful. Personally, I do not see a sane way to get that resolved with qualifiers (see all my statements). I would rejoice if there is... Using an additional rank is problematic since that would interfere with the original concept of ranks (see discussion on the corresponding help page).
Still, I stick to another snak type technically being the most sane solution. If there is another method to flag statements - fine. But regarding qualifiers: Qualifiers do not allow flagging since snaks always consist out of a property and a specific value (unless you choose another snak type - you get the point...). You would need to restrict the value to one particular item which is true (Q16751793). Regardless of true (Q16751793) being a strange item, the method would be too technical, too complex and not prove usable since, even more, you never would use false (Q5432619) for a property "is uncertain" - instead, you would just not assign the qualifier.
I really think that I made my point clear in all the lines above. If you want to see something made up with qualifiers - I cannot offer that since I am not convinced of that being a proper solution; And since nobody else seems to be interested in specifying uncertainty, we can also wipe that topic off the table since "doing something with qualifiers" is unlikely to happen unless you guys take action and figure out a proper solution.
If there is another proper solution (in terms of being logical and usable), yes, I would gladly accept it but, to me, it seems like uncertainty was not really considered in the original concept of the software. However, being labeled a "knowledge base" in contrast to a "fact database", I would suppose modeling uncertainty should be a core concept of Wikidata. Random knowledge donator (talk) 10:17, 16 July 2014 (UTC)
@Random knowledge donator: The other day I was doing some tests with a qualifier "type of statement" to specify uncertainty, universal quantification (∀) and existential quantification (∃). All these options are necessary (perhaps integrated into the software) if some day we want to move away from mere fact collection into the "knowledge base" realm. You can also do some experiments (create properties and items as needed) on the test instance of wikidata. See for instance:
--Micru (talk) 10:58, 16 July 2014 (UTC)
I get the point, but still not convinced, sorry. That seems to capture the logic but for the price of poor usability. "type of statement" is far too generic for users to be aware of using it for specifying uncertainty. Given the huge amount of data that is supposed to be managed in Wikidata, Wikidata cannot rely on experts that have dug into the concepts. I think, being able to represent uncertainty should be as obvious as possible. If not, users will simply not enter data or, probably even worse, enter data as if it was not uncertain... But maybe/probably I am the only one who regards that as important. Random knowledge donator (talk) 09:44, 17 July 2014 (UTC)
@Random knowledge donator: "Uncertainty" is not the only meta-information that statements require. There were users also requesting for a system to protect statements, maybe both features belong together, I don't know. But without testing first what is needed we will not be able to know what to ask for.--Micru (talk) 09:57, 17 July 2014 (UTC)
(Sorry for editing in an archive, but I think this issue is more important.)
I agree with you on whether Wikidata is broken in this respect, it is a lot of things that isn't correct when it comes to handling of simple values values. I'll add bits and pieces as I read your text, but the general idea goes like this
  • Our mental model of an value is quite complex, but we must represent it in some simple way
  • Our value can be a range or a bag of values, probably also an ordered set of values
  • Our value can have several uncertainties and error sources attached to them, and two values in a list might not use the same error model
  • Any value should refer to some kind of datum, but values that share the same dimension might be compared (not always, we know from sources how some relates to each other but we don't know their absolute value)
Let me give an example: A box can have 3 lengths, those are width, depth and height. We could call them extent. We could then say
width: a
depth: b
height: c
or slightly better
extent: a
extent: b
extent: c
or perhaps even for a list of values
extent: {a b c}
If a, b', and c is 1, 2 and 3 then it might be valid to say
extent: [a c]
Those two last forms are very important if you want to keep the values together in a statement, and they are ordered and unordered sets. The first form is typically called a seq in rdf and the last form a bag.

But the values themselves (the a, b', and c), what if we need to model them more accurately? If we have simple values as the main snak then we can describe that value by using qualifiers, that is it is a reified statement anyhow. But if we have separate values inside a bag or seq inside a main snak, then we need to create the values as blank nodes themselves and we put the additional stuff inside that blank node. The qualifiers for the statements as in the Wikidata UI will then refer to the whole bag or seq of stuff, while we keep the very specific additions inside the blank node. We sort of add another level of qualifiers.

(Note that we can add qualifiers to describe uncertainty for a value in a statement, but when that statement is multivalued this might not be apropriate.)
So in your case with Abraham von Freising (Q330885) you will have
died: { "7 June 993" "7 June 994" }
Simply two dates, no mystery at all except for two dates where most people would expect one. That can be solved with a reference to some publication that describes the situation. It will although be troublesome in some context where you want to use a single value.
Messing around with this opens up some quite funny simplifications. What about open ended intervals for a birth date? The nice thing with this is that we can say something about a value without spiraling down the we need yet another special property.
There is also a situation where a statement holds a reference to a vector. I played around with various representations of that, it seems solvable. Some very important cases are where there exist some kind of statistical analysis, like a w:en:Five-number summary. Those represents something that would probably have been squashed into a single value in the present model.
There are several core concepts that should be implemented, and it should be possible to either write some parseable strings or it should be possible to build them some other way. Personally I like parseable strings, take a look at w:en:Well-known text for example.
I think some of the problems regarding the modeling of this is lack of expertise in statistics, probability and real life wetting and analysis of data. It was simply expected that this was a simple problem with simple solutions, but it is not, it is quite complex. Jeblad (talk) 21:01, 17 August 2014 (UTC)
Thanks alot Jeblad for the extensive write-up. I agree with your concept of modeling. Featuring collections of values/multivalued statements would be the most sane way of representing uncertainty when it comes to representing possible alternatives, like for the dates in the example. It would be a step into the right direction although there would still be the need for a solution regarding uncertain values without alternatives (assumptions), like the other example of "someone might have been someone else". However, with multivalued statements, the impact on the data model would probably be quite huge and according to the attention the topic receives, I am convinced that this will not be implemented in the far future and the problem will not be highlighted before there is a lot of misleading data in the database already... Random knowledge donator (talk) 09:56, 25 August 2014 (UTC)
@Jeblad, Random knowledge donator: I've got these ideas, but I don't know if that would be practical. This is also related to some properties that take as value "items", but sometimes with just a string would be enough (P:P1420, I'm looking at you...).--Micru (talk) 13:51, 25 August 2014 (UTC)

Why so big size of diff?[edit]

[1] --Infovarius (talk) 13:48, 27 August 2014 (UTC)

That's due to the new internal serialization. It is a bit more verbose. It's fine :) --Lydia Pintscher (WMDE) (talk) 15:36, 27 August 2014 (UTC)

No Label[edit]

This page is full of template errors. Each links says: "no label": Wikidata:WikiProject Medicine/Properties - Tobias1984 (talk) 16:09, 27 August 2014 (UTC)

Likely because the Lua module creating it is relying on the internal serialization. This changed as announced. The module needs to be adapted. --Lydia Pintscher (WMDE) (talk) 16:41, 27 August 2014 (UTC)