Shortcut: WD:DEV

Wikidata:Contact the development team

From Wikidata
Jump to navigation Jump to search

Contact the development team

Wikidata development is ongoing. You can leave notes for the development team here, on #wikidata connect and on the mailing list or report bugs on Phabricator. (See the list of open bugs on Phabricator.)

Regarding the accounts of the Wikidata development team, we have decided on the following rules:

  • Wikidata developers can have clearly marked staff accounts (in the form "Fullname (WMDE)"), and these can receive admin and bureaucrat rights.
  • These staff accounts should be used only for development, testing, spam-fighting, and emergencies.
  • The private accounts of staff members do not get admin and bureaucrat rights by default. If staff members desire admin and bureaucrat rights for their private accounts, those should be gained going through the processes developed by the community.
  • Every staff member is free to use their private account just as everyone else, obviously. Especially if they want to work on content in Wikidata, this is the account they should be using, not their staff account.
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2018/11.

Project
chat

Lexicographical
data

Administrators'
noticeboard

Development
team

Translators'
noticeboard

Request
a query

Requests
for deletions

Requests
for comment

Bot
requests

Requests
for permissions

Property
proposal

Properties
for deletion

Partnerships
and imports

Interwiki
conflicts

Bureaucrats'
noticeboard

Development plan Usability and usefulness Status updates Development input Contact the development team


Monolingual string language code ("cnr")[edit]

Please add "cnr" for Q7700307#P1705. I found it on the enwiki article about Montenegrin (Q8821).
--- Jura 11:54, 25 January 2018 (UTC)

I've created T185800 for this and patch will follow soon. But before it can be merged, approval by LangCom is needed. Mbch331 (talk) 10:30, 27 January 2018 (UTC)
@Millosh: what do you think ? It's a request for a language code at Wikidata (not a new wiki/interface language).
--- Jura 17:20, 28 January 2018 (UTC)
It's a valid language code (per [1]) and I see no reason why it shouldn't be added. --Millosh (talk) 19:14, 2 February 2018 (UTC)
Is that a personal statement or an official statement from the langcom? If the latter, please add a statement to phab:T185800. Mbch331 (talk) 20:15, 2 February 2018 (UTC)
@Mbch331: thanks for the patch. I think you can go ahead with it @Millosh: is the m:Language_committee#Current member knowledgeable about this language (or closely related ones). --
--- Jura 07:20, 3 February 2018 (UTC)
@Jura1: My part of the work is done anyway. I don't have the right to merge patches. That's only to the WDME devs. I'm just a volunteer dev that submits patches. Mbch331 (talk) 07:23, 3 February 2018 (UTC)
  • It still needs to be addressed.
    --- Jura 10:33, 12 March 2018 (UTC)
    • I've added a comment to the ticket that there is Langcom approval. Mbch331 (talk) 13:02, 12 March 2018 (UTC)
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. Patch for T185800 has been merged. Next passing of the train it will be live. Mbch331 (talk) 18:42, 15 March 2018 (UTC)

{{Section resolved|1=--Lydia Pintscher (WMDE) (talk) 08:23, 23 April 2018 (UTC)}}

  • Finally, what was it?
    --- Jura 11:14, 24 April 2018 (UTC)
We're still working on solving this issue. It's a tough one. Lea Lacroix (WMDE) (talk) 12:24, 24 April 2018 (UTC)
  • It actually works. So what was needed to make it work?
    --- Jura 09:42, 1 May 2018 (UTC)
    • Still open.
      --- Jura 09:00, 31 May 2018 (UTC) --- Jura 09:42, 1 May 2019 (UTC)

Placement in identifier section[edit]

To detail a point brought up earlier, is it possible to add an option to place statements of a few properties (e.g. P1036, sample use: Q64#P1036) in the identifier section of pages (in the sample Q64#identifiers). Currently the sort seems to be based exclusively on datatype.
--- Jura 06:52, 17 February 2018 (UTC)

Why not simply change P1036 to external-id datatype? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:51, 17 February 2018 (UTC)
Dewey Decimal Classification (P1036) is not an identifier but classifier. An identifier identifies an unique instance of class. In contrast, a classifier is used to tag all instances of a class. For example the code 813.54 means American narrative prose from 1945 to 1999. If P1036 is an identifier, we would add that code to the item about the class of American narrative prose from 1945 to 1999. But since P1036 is used as classifier, we add the code to all instances of that class. --Pasleim (talk) 14:10, 17 February 2018 (UTC)
would be more useful to have such classes on Wikidata and to use this property only on the class item. This would allow us to encode the class definition as statements in Wikidata and to infer instances automatically, for example with {{Implied instances}}. Or to add automatically the statements in the class item to the explicit instances of it without having to put the logic into the bot (or an out wiki inference rule). author  TomT0m / talk page 14:17, 17 February 2018 (UTC)
In one sense, certainly; but we have plenty of other external-id properties on Wikidata that "identify" classes of things. Do we really add Dewey codes to every item about a book with that code? The examples on the property page certainly do not suggest that. I also note that, in the property proposal, User:Merrilee of OCLC said "The Dewey identifiers are just that -- identifiers... I had an in depth conversation with the Dewey Editor, Michael Panzer, last week and this use of Dewey identifiers as free for all to use lines up with his understanding as well.". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:17, 17 February 2018 (UTC)
  • I think the datatype of one or the other properties for which this can apply should be discussed elsewhere. I think there is a general need for this.
    --- Jura 08:24, 20 February 2018 (UTC)
  • It is currently possible to create a section based on a list of datatypes or on a list of properties but not both. And I don't think we can spend time on it without having a good reason why these properties can't just be converted to external identifier. --Lydia Pintscher (WMDE) (talk) 16:26, 7 March 2018 (UTC)
    • What could we do about exact match (P2888)? There are obviously some that could be converted, but others that can't. There is also the group of identifiers that emerged during the conversion discussion and remained with string datatype.
      --- Jura 08:54, 13 March 2018 (UTC)
      • I'd say as a next step the discussions about the remaining ones need to be concluded. I am not sure what they are held up on. --Lydia Pintscher (WMDE) (talk) 10:45, 15 March 2018 (UTC)
        • Even for the ones that are already in the "not to be converted section", some are identifiers and have formatter urls. BTW exact match (P2888) has url datatype and I think it should be in the identifier section.
          --- Jura 11:34, 15 March 2018 (UTC)
          • @Lydia Pintscher (WMDE): can we move ahead with this?
            --- Jura 07:01, 19 April 2018 (UTC)
            • I am not sure what I can do. It needs editor consensus and then I can move ahead with scheduling the conversion. Or am I missing something? --Lydia Pintscher (WMDE) (talk) 08:25, 23 April 2018 (UTC)
              • What would you want to convert exact match (P2888) into?
                --- Jura 11:15, 24 April 2018 (UTC)
                • It seems fine to me as it is but I might be missing something. In general this is a decision I'd like the editors to come to rough consensus on and if that doesn't work I can make a decision but only then. It takes me time to get into the details of the discussion and make the right overall decision and I'd rather not do that if not absolutely needed. --Lydia Pintscher (WMDE) (talk) 09:45, 26 July 2018 (UTC)
                  • Other than you, I don't think anyone wanted to convert exact match (P2888) to another datatype (currently URL). The request here is to place it in the identifier section. The same goes for some of the properties with string datatype. It seems fairly trivial as a request. I get the impression that Wikidata development might have been absorbed by lexemes.
                    --- Jura 07:56, 27 July 2018 (UTC)
  • Seems this is still open.
    --- Jura 09:05, 31 May 2018 (UTC)
    --- Jura 11:15, 24 April 2019 (UTC)

Formatter url and string datatype[edit]

A problem brought up further up on this page. It seems that this works sometimes and sometimes it doesn't. What's is the development status on this? I think ideally this would work as for "external id"-datatype when the url is set on a property.
--- Jura 08:30, 20 February 2018 (UTC)

I checked and it is possible to do. Can you give me 2 or 3 examples where this would hold and not be an external identifier? I need to look into some of the details because of the RdF export before someone can work on it. --Lydia Pintscher (WMDE) (talk) 10:59, 15 March 2018 (UTC)
Full list: https://query.wikidata.org/#SELECT%2a%7B%3Fp%20wikibase%3ApropertyType%20wikibase%3AString%3Bwdt%3AP1630%5B%5D%7D
--- Jura 11:41, 15 March 2018 (UTC)
Ok should be doable. Can you open a ticket for it? Or Léa? --Lydia Pintscher (WMDE) (talk) 14:34, 13 April 2018 (UTC)
Léa? That would be helpful. Apparently, I can't write tickets. Besides, I surely wouldn't find it if someone had already opened one.
--- Jura 14:53, 13 April 2018 (UTC)
done, phabricator:T192188 --Pasleim (talk) 00:25, 14 April 2018 (UTC)


I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. --Lydia Pintscher (WMDE) (talk) 08:26, 23 April 2018 (UTC)
  • The point that might be missing in the phab ticket is that when the formatter url is qualified with a regex, then this should be taken into account. What would be the next steps?
    --- Jura 11:17, 24 April 2018 (UTC)
  • Seems this is still open.
    --- Jura 09:05, 31 May 2018 (UTC)
    --- Jura 11:15, 24 April 2019 (UTC)
    @Jura1: to update signature to hold up bot auto-archiving you can just replace the old one, learn me. --Liuxinyu970226 (talk) 11:02, 29 September 2018 (UTC)

Wikidata:Development plan[edit]

@Lea I think Wikidata:Development plan needs some curation... --Succu (talk) 20:38, 20 October 2018 (UTC)

Thanks for the reminder. We're currently working on the plan for 2019 and next, as soon as I have information that I can share, I will update the page. Lea Lacroix (WMDE) (talk) 07:20, 21 October 2018 (UTC)

@Lydia Pintscher (WMDE): thanks. I didn't comment earlier as it's actually a complicated stage for Wikidata.

  1. Bearing in mind that most basic features are in place, the question is now how to ensure that Wikidata can grow into different magnitudes. We need to ensure to be able to do that in terms of capacity (server capacity, etc. and tools to import and edit) and in terms of being able to handle quality.
  2. I can't really comment on what's technically needed to grow into another magnitude, but users (like me) should come to a good understanding of the other points. I will try to comment on this aspect another day. Sync with other databases can be a feature for that.
  3. Of the basic features, maybe time precision, quantity range and astronomical coordinates could be addressed.
  4. For interested Wikipedias, there should be a clear way to ensure infoboxes and annotated lists can be used.

Some the points in the plan might take a few months to realize, for others we might just see a step within the next year. Obviously, there are always some tweaks to gui and other that come in, but these might not necessarily be key. --- Jura 05:58, 14 November 2018 (UTC)

@Lydia Pintscher (WMDE): Thanks for posting the plan. As Jura, I also believe that Wikidata has most basic features in place, and I hope that can be used as a promotion tool outside of the Wikimedia world. For instance, it would be nice to establish contacts with the corporate world and start some kind of pilot project (like Wikidata:FactGrid was for humanities) to see what are the needs of the projects that they might envision, so that eventually there are more Wikibase users.

I find some of the tasks in the roadmap a must (citoid, easier queries, federation, client editing, and mobile web support), but there are other tasks that could be there as well, like for instance building the tools to enable Wiktionaries to reuse lexicographical data generated or imported here. It is also not very clear which data Wiktionaries might need (I suppose initialy translation lists), or even how to navigate from items to Lexemes.

I would also appreciate if there would be some long term investment in research. The Wikidata structure has limits representing knowledge, and if some day we want to achieve an "abstract Wikipedia", we'll need to think beyond yearly plans.--Micru (talk) 21:54, 14 November 2018 (UTC)

LDF client reliability[edit]

How experimental is http://ldfclient.wmflabs.org/#datasources=https%3A%2F%2Fquery.wikidata.org%2Fbigdata%2Fldf ? It seems an interesting alternative to the query service for lenghty queries, unfortunately it seems to silently fail to get all result and stop for unknown reason while the process is not finished (if I’m right). Is it why you don’t really advertise this endpoint ? author  TomT0m / talk page 16:23, 31 October 2018 (UTC)

It should work but it isn't getting that much feedback. So if you use it and run into issues please do report them (on phabricator ideally) so they can be fixed. --Lydia Pintscher (WMDE) (talk) 17:22, 31 October 2018 (UTC)

Misalignment of SPARQL[edit]

I found a lot of item deleted some days ago, but query returns these elements in the results. Example in this list generated today, you can find a lot of item deleted also 10 days ago. It's possible solve it? --ValterVB (talk) 18:43, 6 November 2018 (UTC)

@Smalyshev (WMF): Any idea what's going on? --Lydia Pintscher (WMDE) (talk) 12:59, 9 November 2018 (UTC)
Looks like some updates have been missed, I'll take a look. Smalyshev (WMF) (talk) 19:09, 11 November 2018 (UTC)
Any news about this? Also because I think that the problem continues, I find item deleted 2 days ago --ValterVB (talk) 19:07, 14 November 2018 (UTC)

Times-series properties - Wikidata UI performance[edit]

Dear development team, we are disussing on the WD project chat about the modeling of statistical properties with time-series like nominal GDP (P2131). One user mentioned here, that Wikidata may be not well-suited to time-series data. I have made an estimation for already available properties and new (not proposed) properties, for wich we have lists in Wikipedia. For example for United States of America (Q30):

  • If we import Data for all available properties for USA (population, inflation rate etc.) like we have already done this for nominal GDP (P2131), then we would have ca. 1650 new statements (old values for these properties remain deprecated)
  • If I count statements for new (yet not proposed) properties, that are used in Wikipedia, then I come (in the sum) up to ca 1800 new statements

This is just a rough estimation, without counting other new properties that could result in the future.

Can the Wikidata UI handle this? --Datawiki30 (talk) 20:27, 6 November 2018 (UTC)

Are we talking about 1800 statements on one item? Then Special:LongPages and https://grafana.wikimedia.org/dashboard/db/wikidata-datamodel-statements?refresh=30m&orgId=1 say yes it is possible. Will it make you happy? Very unlikely - especially on high-profile pages that get visited a lot. As has been said on Project chat Wikidata isn't very good at this kind of high-granularity time-line data. It's not what it's made for. (And it can't be good for everything. We have to make trade-offs.) --Lydia Pintscher (WMDE) (talk) 13:04, 9 November 2018 (UTC)
Hi Lydia and thank you for your answer. Where is the bottleneck - is this the used database, datamodel or the UI? If the UI is the bottleneck - isn't possible to first query the sum of all the statements and then decide: a) if the numer of statements is below x, then the actual UI can be used and b) if the numer of statements exceeds x, then the only few statements per property are shown and an AJAX function (or something like that) is used to show more statements if needed. Could this help? --Datawiki30 (talk) 22:10, 9 November 2018 (UTC)
In the future possibly yes. But it's not something we can do anytime soon unfortunately. --Lydia Pintscher (WMDE) (talk) 14:27, 10 November 2018 (UTC)

Current rate limits[edit]

Hey dev team, after all the trouble with dispatching throughout the past year, there were edit rate limits in place until recently (IIRC: 80 or 90 per min in times with low max lag times). Is this still the case? (Probably not, as User:QuickStatementsBot operates at 180/min on average over the past 24 hours [2]).

So I’d like to ask which limits apply at the moment, and where I can look up this value at any time. I’m considering speeding up my bot account as well, but I do not want to run into trouble at all… —MisterSynergy (talk) 16:30, 8 November 2018 (UTC)

@Ladsgroup: Can you help? Ideally with a link so people can look it up themselves in the future? <3 --Lydia Pintscher (WMDE) (talk) 13:07, 9 November 2018 (UTC)

Hey, the way to handle it is to just respect maxlag. Stop editing once the maxlag is more than 5 "seconds". Hope this helps Amir (talk) 14:05, 9 November 2018 (UTC)

Thanks! So "No limits" in case everything runs smoothly :-)
However, are you sure about "5 seconds"? It used to be 5 minutes before the value was recently reduced to the current range; WD:Bots still talks about "60" (seconds), which has always been an outdated value of course. —MisterSynergy (talk) 14:57, 9 November 2018 (UTC)
I think the documentation was misunderstood. you need to provide the maxlag=5 to the API when making an edit, the API would return error when the server lag is actually higher than what you sent (in this case 5). You need to wait for some time and try again. 5 can be seconds of replication lag (between master and replicas) or it can be minutes of dispatch lag between wikidata and median of its clients (so maxlag=5 would return error if any of these two happen) Amir (talk) 17:16, 9 November 2018 (UTC)
Okay thanks, I think I understand most of it. As I am using pywikibot in standard configuration, I should be on the safe side anyways. I have observed auto throttling due to high server load several times in the past. —MisterSynergy (talk) 17:58, 9 November 2018 (UTC)
@Amir: If maxlag=5 is a requirement, why exist this parameter at all? --Succu (talk) 21:04, 10 November 2018 (UTC)
User:Succu: Technically we want to give as much as freedom as possible. In other words, we let users/community decide how to handle the pressure. Amir (talk) 14:54, 14 November 2018 (UTC)

Unhelpful error message returned by some Lua functions[edit]

I suppose it has already been discussed but I can't find it. Most annoying things it that is does

  • mw.wikibase.getEntityObject("QXX") -> "The ID "QXX" is unknown to the system. Please use a valid entity ID."
  • mw.wikibase.getAllStatements("Q1", " P31"): -> "failed to serialize data"

I think the first of those messages has been slightly improved over a previous reason, but still.. The really annoying thing is that there is no backtrace inside Lua code, as there is for other Lua errors. -Zolo (talk) 14:52, 10 November 2018 (UTC)

Root Category - PT wikipedia dump[edit]

Hi!

I am developing a Natural Language Process (NLP) project and using Hadoop/Map Reduce/Google Cloud to process wiki dumps. I´ve used the Wikipedia Miner project (https://github.com/dnmilne/wikipediaminer/wiki) to extract the english dump file (https://dumps.wikimedia.org/enwiki/20181020/enwiki-20181020-pages-articles-multistream.xml.bz2). The Wikipedia Miner needs a configuration file to inform the wikipedia root category to extract. I've used the category Contents and it works fine!!! My real problem happens trying to process the portuguese dump (https://dumps.wikimedia.org/ptwiki/20181020/ptwiki-20181020-pages-articles-multistream.xml.bz2). I've used the category Conteúdo as root category and DOESN'T work. I did several changes and combinations to generate the portuguese root category (Conteúdos, Categoria:Conteúdos, Artigos, Conceitos etc) and nothing. Someone could help me please? I need this dump extract to mey PHD research.

Valtemir - PHD research valtemir.alencar@gmail.com