Shortcut: WD:DEV

Wikidata:Contact the development team

From Wikidata
Jump to: navigation, search

Development
Plan

Status
Updates

Paper
Cuts

Development
Input

Contact
the development team


Contact the development team

Wikidata development is ongoing. You can leave notes for the development team here, on #wikidata connect and on the mailing list or report bugs on Phabricator. (See the list of open bugs on Phabricator.)

Regarding the accounts of the Wikidata development team, we have decided on the following rules:

  • Wikidata developers can have clearly marked staff accounts (in the form "Fullname (WMDE)"), and these can receive admin and bureaucrat rights.
  • These staff accounts should be used only for development, testing, spam-fighting, and emergencies.
  • The private accounts of staff members do not get admin and bureaucrat rights by default. If staff members desire admin and bureaucrat rights for their private accounts, those should be gained going through the processes developed by the community.
  • Every staff member is free to use their private account just as everyone else, obviously. Especially if they want to work on content in Wikidata, this is the account they should be using, not their staff account.
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2016/02.


IMDb event URL problem[edit]

See here. --Jobu0101 (talk) 21:21, 16 January 2016 (UTC)

As User:Mbch331 told me it's not possible to deal with that problem for the moment because Wikidata can't currently handle multiple formatter URL's. Are there plans to update the software? --Jobu0101 (talk) 21:32, 17 January 2016 (UTC)
It seems the problem can be solved here by using another URL pattern for all of them? It is not currently on the short-term todo to invest more time into more sophisticated linking. --Lydia Pintscher (WMDE) (talk) 09:27, 18 January 2016 (UTC)
No there is not 1 URL pattern that works for all. That's the whole problem. If there was, there would be an easy solution. The URL pattern is http://www.imdb.com/<type of page>/<value of P345>. And the type of page (name, character, title, event) isn't part of the IMDb id (but can be derived from the id). Mbch331 (talk) 09:37, 18 January 2016 (UTC)
Ah ok. What would be needed on our side to make it work? --Lydia Pintscher (WMDE) (talk) 09:46, 18 January 2016 (UTC)
The first two characters of the IMDb id defines the <type of page>. For example nm -> name, tt -> title, ev -> event. For each type we would need a different URL prefix to which the id is appended. --Jobu0101 (talk) 10:03, 18 January 2016 (UTC)
The full list is: nm -> name, tt -> title, ev -> event, co -> company, ch -> character (first 2 letters of IMDb id -> type of page as currently used by IMDb). Mbch331 (talk) 10:16, 18 January 2016 (UTC)
Mpfh. Why do they do this... *sob* Ok as an easy fix we could have different properties for it here on Wikidata but that also is a pretty sucky solution I guess? --Lydia Pintscher (WMDE) (talk) 10:02, 22 January 2016 (UTC)
Yeah, different properties would suck a lot ;). --Jobu0101 (talk) 18:10, 27 January 2016 (UTC)
I think separate properties would be more consistent. For pretty much every other situation where a site has multiple types of identifiers, we use separate properties (AllMovie, AllMusic, AlloCiné, ...). It also helps with constraints, since, for example, a person should have a person ID not a film ID, but as far as I'm aware there's no way of doing that when the identifiers are combined into a single property. - Nikki (talk) 10:07, 28 January 2016 (UTC)
To check constraints on properties with multiple types of identifiers I set up a system based on sparql queries, see [1]--Pasleim (talk) 11:13, 28 January 2016 (UTC)

I think the problem is solved. It's working now. Who solved it and how? --Jobu0101 (talk) 11:59, 30 January 2016 (UTC)

It was Matěj Suchánek with this edit. Mbch331 (talk) 12:27, 30 January 2016 (UTC)

SPARQL query shows deleted item[edit]

The following query:

 
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX p: <http://www.wikidata.org/prop/>
SELECT ?item ?YouTube ?YouTubeEntity WHERE {
	?item wdt:P2397 ?YouTube .
    ?item p:P2397 ?YouTubeEntity .
    filter(SUBSTR(str(?YouTube),1,2)!="UC")
           }

Try it! returns a deleted item as a result. The item was deleted on January 9th. It's the statement for Q21994171. Mbch331 (talk) 12:45, 27 January 2016 (UTC)

Pinging @Smalyshev (WMF): -- Jheald (talk) 12:56, 27 January 2016 (UTC)
Today it no longer appears in the results. Mbch331 (talk) 17:45, 28 January 2016 (UTC)

Limitations of the current SPARQL implementation[edit]

@Deskana (WMF), Lydia Pintscher (WMDE), Jheald: (Following up from the office hours discussion. In the time since then, it has turned out that several things I thought were impossible were actually possible, and that there's a lot I don't know about using SPARQL.)

Some apparent limitations in the current query system implementation:

  • The * and + operators can't give clear filters, such as relating to rank or qualifiers, to each link on the way. To give a hypothetical example: Former Province X (qId X) was a province of a country that existed from 1000-1500. The successor state(s) doesn't use the same subdivisions, but we still have historical articles on the provinces. So, suppose we want to generate a list of all people who were born in X. ?p wdt:P19 wd:QX . doesn't work for people born in subdivisions of X, or in other entities located in the administrative territorial entity (P131) X. So, we could try ?p wdt:P19/wdt:P131* wd:QX ., but that doesn't have them show up either. Why is that? Because the subdivisions of X are now part of other divisions. Those statements have the preferred rank, while the historical data we want uses the normal rank. We could try ?p wdt:P19/(p:P131/v:P131)* wd:QX ., but that comes with even more problems. First, all statements with deprecated rank are included, which we presumably don't want. Second, the results include people born in current subdivisions of X's subdivisions, which have start dates after X's dissolution, and thus were never part of X. We can handle making sure the birth date itself overlaps with X's existence, but we can't check each level of an unspecified number of parent territorial division statements to filter to only those which were true at the time of the birth date.
  • Suppose we want to query the average current population of standard neighborhood areas in the great non-country state of Foo, which has an inconsistent number of layers of subdivisions. But we have high standards for data; not only do we want sources for the population statements, we want sources for all the P131 statements leading to Foo. Real sources, not any imported from (P143) "sources". We could check any individual statement for this, but there's no way to do that for chains. (Similar situations include: Lists of prominent descendants of a certain individual, some types of taxonomical listings, teacher/student trees.)
  • Frequently, we want to identify items that have a certain "value type". This refers to whether an item is a instance of a certain class, which we can usually find out with wdt:P31/wdt:P279*. However, this doesn't always work perfectly. We also have subproperties. If we want to be more thorough, for P31 we can use ?instancewdt ^wikibase:directClaim/wdt:P1647? wd:P31 . ?p ?instancewdt [ wdt:P279* ?class ]. . Where this becomes a problem is with subclass of (P279). Ideally, we'd want to define ?subclasswdt similarly, but using ?subclasswdt* isn't valid syntax. (Currently there is only one subproperty of P279, but I suspect there will be more in the future.) (Similar issues exist for subproperties of other properties such as location or part of.)
  • Certain complex datatypes are extremely difficult to work with. For example, it is essentially impossible to determine whether a date is certainly later than another date, if they have different precision values. It is also not possible to accurately determine distance between two globecoordinates.
  • Certain complex calculations will inevitably result in a query timeout. I've been trying to work on a query that returns a list of all humans who lived the majority of their lives in a certain area, with some lines attempting to subtract overlapping parts of date ranges in residence statements. I have yet to figure out a query that doesn't timeout.

--Yair rand (talk) 08:54, 2 February 2016 (UTC)

Thanks so much for taking the time to write this down. That is very helpful.
@Smalyshev (WMF): also interesting for you. --Lydia Pintscher (WMDE) (talk) 09:06, 2 February 2016 (UTC)

deletion and restoration of pages on other wikis causes data loss at Wikidata[edit]

When a page that is associated with a Wikidata item is deleted on another wiki, the link from that item is automatically removed. However, if that page is the undeleted for whatever reason, the link is not restored with it and all the interlanguage links, etc. are lost until someone notices and manually re-adds the link at Wikidata (there is no way to do this on the local project). See the history of Q5125870 for example - the en.wp article was accidentally deleted and immediately undeleted on the 2nd but the link was not restored until I noticed the link removal on my watchlist today (5th). In addition to accidental deletions, pages may be deleted and then restored due to overturned (speedy) deletions, the merging of page histories and possibly other situations too.

There is a discussion about this on en.wp at w:Wikipedia:Administrators' noticeboard#If you delete and then restore a page, make sure to recreate the link to Wikidata to raise awareness among admins there that this is an issue, however that is at best a workaround and there should be a technical solution. The options suggested in that thread are to either delay the deletion (for 1 day was the suggested duration) or to have the undeletion of a page re-add the link here (my preference). There may be other options too. I don't know how either option could be implemented, or even if they are possible, but as it stands data is lost unnecessarily. Thryduulf (talk: local | en.wp | en.wikt) 20:47, 5 February 2016 (UTC)