Wikidata:Contact the development team/Archive/2019/09

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Adding a label that is already in use

I tried to add צבי to he label of Zvi (Q231342)

I got this error message: "Could not save due to an error. Item Zvi (Q55362931) already has label "צבי" associated with language code he, using the same description text."

What should be done on such case? The word in Hebrew is the same word, but some people transliterate Zvi and some Tzvi. Thanks, Uziel302 (talk) 13:52, 30 August 2019 (UTC)

@Harmonia Amanda: do you have an idea? Lea Lacroix (WMDE) (talk) 08:55, 2 September 2019 (UTC)
I cleaned up the descriptions so it would be clear that Zvi (Q231342) is about the Latin-script name and Zvi (Q55362931) is about the Hebrew name. I added the other transliteration as an alias for that one. --Harmonia Amanda (talk) 08:58, 2 September 2019 (UTC) edit:I also moved the sitelinks, which were all speaking of the Hebrew name, not the Latin-script one. --Harmonia Amanda (talk) 09:00, 2 September 2019 (UTC)

possible new tool: VIAF identifier importer

Hi folks -- I'm not sure quite the right place to ask this, so if there's a better place to do so, please let me know.

I've been developing a tool to make it easier to import VIAF-linked identifiers into Wikidata. It's far enough along now that I'd like to get your input on the utility of the tool and how we might leverage it to be useful for the Wikidata community as a whole. Right now, it's hosted on my own web space. I'd prefer not to link to it publicly, so I've made a quick showing how it works.

It looks up a Q-item and a VIAF ID, then looks at all the other identifiers linked from VIAF. It formats them as necessary, validates them against the known format as a regular expression (P1793) associated with the identifier, and spits out the appropriate QuickStatements-formatted data for all the identifiers it validated.

Not explained in the video are what happens to data that don't work out: if the tool doesn't know what to do with an identifier or it fails the regex check, it's noted as an error at the bottom and not put into QS list.

Does this look like something that would contribute to the Wikidata world and not already accomplished elsewhere? If so, what would be an appropriate development path to bringing it into production?

Some areas for development I have in mind include:

  • adding rules for handling some identifiers it currently doesn't know what to do with
  • there could be a bot that does the same thing for records that already have VIAF ID (Q19832964) statements, and by-passes QuickStatements and just adds the extra ids in automatically

What do you think?

Thanks - Kenirwin (talk) 19:55, 4 September 2019 (UTC)

Hello,
Thanks for your work! I think you should bring this discussion to the mailing-list or the project chat page, where more people can give feedback about it :) In general, people autonomously develop tools and add them on the Wikidata:Tools page. Feel free to do the same. Lea Lacroix (WMDE) (talk) 10:42, 9 September 2019 (UTC)

What happened with "undo"?

It seems I can't find it any more when selected multiple edits. I could click rollback, but this is generally meant for something else. --- Jura 13:53, 10 September 2019 (UTC)

Hello, is the issue still happening for you? I tried again and I see "undo". Lea Lacroix (WMDE) (talk) 13:59, 11 September 2019 (UTC)
Looks like it's back. That was quick! Thanks to all involved. --- Jura 18:20, 11 September 2019 (UTC)

phab:T209208, can this be deployed more quickly?

There's local agreements now. --Liuxinyu970226 (talk) 22:17, 12 September 2019 (UTC)

Thanks for the ping, we'll move forward with this as soon as our resources allow it. Lea Lacroix (WMDE) (talk) 11:55, 16 September 2019 (UTC)

Distinct page background colo(u)r based on P31 (for Q5)

In a discussion on project chat, @Simon Villeneuve: brought this up. I think it would be an interesting addition. Could we try one for items with instance of (P31)=human (Q5) to start with? Pick whatever color GUI designers suggest. --- Jura 14:08, 13 September 2019 (UTC)

I think this is a good idea for a user script, but not for a default feature for all users. Lea Lacroix (WMDE) (talk) 14:55, 13 September 2019 (UTC)
Another similar request : when any dissolved, abolished or demolished date (P576), show a box or an icon "closed" or something like that... (perhaps asking for the moon :) ) Bouzinac (talk) 15:07, 13 September 2019 (UTC)
  • Re: Background color: maybe a gadget? We can then discuss if it should be on by default. The problem is that once users figure out how to use user scripts, they wouldn't really need it anymore.
    Looking at the html source of a page, I don't think there is any feature that could use css for it. Could we include some support? --- Jura 15:09, 13 September 2019 (UTC)
Lydia and I have been looking again at the original discussion. We acknowledge the issue about duplicates, but we don't think that adding a background color is a good way to solve it. Also, background color is not the kind of information that everyone can interpret correctly, and is not accessible for everyone. So we are not going to add this as a default feature. Lea Lacroix (WMDE) (talk) 11:09, 16 September 2019 (UTC)

mw.loadData and wikibase limit

I bumped into this more or less by accident, but what if wikibase made its own mw.loadData() with higher load budget and better caching? Now mw.loadData() caches for the duration of one page, but what if we could cache for longer time and invalidate the cache as necessary? It would make it possible to use higher load limits, because the computed data can be reused. The generated data could be stored as a pageprop blob, and the module generating the blab would have to track all items it depends upon, but that is pretty straight forward.

A sufficient criteria for the data set to be cached would be that it does not use implicit load of a connected item. A necessary criteria for reusing the cached data would be that the page requesting the data has an current revision older than the timestamp of the cached data. That is necessary, but not sufficient, as some other item might be updated without the data set being invalidated yet. (The timestamp might be different from the current revision if the pageprops table is manipulated, but it is probably better to do something like that in a separate table. Or memcached.)

It would solve problems like generating GDP indexes for countries and sorting the list to get a ranking, the flag lookup for sport articles, listing ESC winners (and Nobel winners, etc) in the individual articles. It is although most interesting in articles depending on some computation or special formatting of the data. For example when the GDP indexes must be expressed in a local currency. (Yes, this can be done in WQS.)

It would probably also make a very nice fit together with the Wikibase Query service, but it solves a slightly different problem. I'm not sure WQS can expose all traversed relations, and thus be able to invalidate the cached data. That could be a showstopper.

A function mw.wikibase.loadData() could actually make it (nearly) unnecessary to integrate the Wikibase Query service in Wikipedia. (Now I'm going to be unpopular, again…) Jeblad (talk) 21:57, 8 September 2019 (UTC)

@Jeblad: What exactly (statements of an entity?) would you load via that new method? Why are the current access methods not enough for these use cases (they are also, although only for a limited number, in-memory cached)? -- Hoo man (talk) 16:25, 16 September 2019 (UTC)
@Hoo man: This is typically pages that need a large fixed set of statements for comparison or presentation. Information about nationality for winners from a large sports contest, this is typically such things as flags. Comparing normalized demographic (also GDP, land use, etc) statistics to create a ranking. Those numbers are non-static and must be recalculated when the numbers from any country changes. Ranking of mountain tops and lake area can be put into the items, as this is information that will not change. In those cases too it could be better to calculate the ranking in a separate module. A third use case would be to collect pages about numerical units, to do such things as getting to the short forms, or simplifying units into normalized short forms, or inverting unit names. This particular use case can be solved by autogenerating redirects and a few additional properties. A fourth use case is to build inheritance trees, but that should probably be explicitly solved. Now we avoid at all cost to traverse instance of (P31) and subclass of (P279), but properly checking the type hierarchy is important for creating navigation templates.
[rant] Navigational templates could perhaps be solved with mw.wikibase.getReferencedEntityId( fromEntityId, propertyId, toIds ), but I have not tried to reimplement any of the existing modules. It is probably (?) implemented slightly wrong. It should filter down the set of toIds into the ones that satisfy the constraint, not only return a random match. It should be two methods, one mw.wikibase.isReferencedEntityId and another mw.wikibase.filterReferencedEntityId.
The current blocker for creating a module like that is the limit of 500 requests. Unwinding information of all countries eats a good chunk of the load budget, and we are hard pressed to create pages that stay within the limit. Jeblad (talk) 18:33, 16 September 2019 (UTC)
@Jeblad: So what you want is to have a pre-computed (and cached) collection of statements, collected according to certain rules (probably by a Lua module that can be invoked stand-alone to collect this data into a table)?
Regarding mw.wikibase.getReferencedEntityId: If that is needed often, we could potentially create a similar function that checks for the presence of all ids and return those that are referenced. --Hoo man (talk) 18:22, 17 September 2019 (UTC)
@Hoo man: On the first part; yes, and it would be pretty close to Wikibase Query service integrated into Wikipedia. Most use cases will be better served with WQS, but some would be hard to implement with WQS alone. You might view this as a map-reduce problem, where map is implemented as WQS and reduce (or conditioning) as Lua code.
On the second part; What I did was to create standardized navigational templates for administrative areas. (Same thing happen for a lot of such templates.) Typically I have a set of statements from contains the administrative territorial entity (P150), but must filter that down to real municipalities. Statements like Q103732#P150 contains weirdness like Dalarna County Council (Q3231325). It is not an administrative area, it is the administration itself. The entities from P150 must be filtered on type municipality of Sweden (Q127448), or even better municipality (Q15284) or second-level administrative division (Q13220204). Trying to traverse the hierarchy iteratively is awfully heavy, but can be done much more efficient with mw.wikibase.getReferencedEntityId, especially if it is implemented somewhat better. Just put in a flag to signal if any hit is sufficient or whether filtering the whole set is necessary. [Seems like what I describe is in the opposite direction of what getReferencedEntityId does…] Jeblad (talk) 19:36, 17 September 2019 (UTC)
Note that this is really a feature request for Lua, but the reason for the request is load-situation due to Wikidata. Jeblad (talk) 00:40, 19 September 2019 (UTC)

Some automatic edit summaries are useless

Hello,
Automatic edit summaries are necessary to fight vandalism efficiently. But sometimes I see the automatic edit summary "‎Updated item" (which is useless). Why?
For instance:

  • MediaWiki says "‎Updated item" here, whereas it was able to give a useful edit summary in the same context (‎a change of a label) here.
  • MediaWiki says "‎Updated item" here, whereas it was able to give a useful edit summary in the same context (additions of a label and a description) here.
  • MediaWiki says "‎Updated item" here, whereas it was able to give a useful edit summary in the same context (‎an addition of an alias) here.
  • MediaWiki says "‎Updated item" here, whereas it was able to give a useful edit summary in the same context (‎a change of an alias) here.

Regards --NicoScribe (talk) 22:10, 13 September 2019 (UTC) → 18:58, 14 September 2019 (UTC)

Hello, these edits come from the new mobile termbox. We're currently working on improving these edits summary, per phab:T220696: in the upcoming weeks, the new edit summaries will be slightly more explicit ("Changed label, description and/or alias in # languages") and more improvements will follow. Feel free to check edits again in a few weeks, and let me know if you still encounter this issue. Lea Lacroix (WMDE) (talk) 10:11, 16 September 2019 (UTC)
@Lea Lacroix (WMDE): It would be preferable to just use standard edit summaries that actually shows "Pablo" on Special:Diff/1012945713 for item John Locke (Q21198546), not "Changed label, description and/or alias in 1 languages". (sample by @NicoScribe:} above).
Can you provide the community request that lead to this change? --- Jura 10:16, 18 September 2019 (UTC)
Edits coming from the new mobile termbox will gather several changes on labels, descriptions and aliases in various languages. Which means one edit can possibly contain changes on 50+ languages. If we would describe them all in the edit summary, it would be way too long. As described in phab:T220696, the original idea is to split the cases depending on how many terms were changed. If a change is made on less than 5 terms, we would provide an extended version eg "Added [en] label: Frame of Notre-Dame de Paris, Changed [en] label: Maja, Added [fr] alias: Marie". But if there are more, we will have a shortened version. The fallback version, used when many terms are changed, being "Changed label, description and/or alias in 60 languages".
The first step of this feature will be to display the fallback version by default, before we're able to detect the number of terms and display the correct option.
Please note that this change will only have an impact on edit made from the wbeditentity API, which includes the new mobile termbox. Other edit summaries will remain the same.
As this feature is part of a broader project that didn't exist before (the new termbox interface on mobile), it is not based on a community request. Lea Lacroix (WMDE) (talk) 10:44, 18 September 2019 (UTC)
Makes sense. Thanks for doing that. --- Jura 13:24, 21 September 2019 (UTC)

Designing to avoid query limits

Hello all,

A python project I am working on keeps getting the HTTP code 429, which has resulted in a ban from the service. I continued to produce requests after an initial result in an (unsuccessful) attempt to figure out the problem (after reading the link here I still don't understand how to use a header).

The goal of the project is to iterate through a list of wikidata items and retrieve the value for a preselected property of the items. (For example, it could grab the run times of a saved list of films)

Any advice on how to get the ban lifted, and especially on how I can design my project in order to avoid query limits (for instance, do I need to code in a pause after each iteration through the list?), would be greatly appreciated. Reason&Squalor (talk) 20:26, 4 September 2019 (UTC)

Hello, is your tool having a user agent header? Tools not having it may be blocked on the Query Service. Lea Lacroix (WMDE) (talk) 10:46, 9 September 2019 (UTC)
I assume I don't. I'm read the page you linked a few times (both when looking up a 429 and after your reply), and can't understand how to figure out if I have one, or add one if I don't. The tool is a Python file made following the basic Django tutorial. I'm not sure if this is relevant, but the tool was not initially blocked, it took a dozen or so uses to get the 429.

Thank for your reply. Reason&Squalor (talk) 16:10, 10 September 2019 (UTC)

You can find an example here: m:User-Agent_policy. Lea Lacroix (WMDE) (talk) 07:02, 11 September 2019 (UTC)
Thank you for the example. Overriding the browser User Agent Header with my own did not have an effect; I still get the 429. What steps should I take next? Reason&Squalor (talk) 00:41, 18 September 2019 (UTC)

Hello! Could you share the user agent that you are using, and examples of the queries? I can try to track them in the logs and see if I find something interesting. More generally, when you are receiving an HTTP 429 response, it will contain a "Retry-After" header. Your script should sleep for this amount of time. Or you could just sleep for 2 minute. GLederrey (WMF) (talk) 10:00, 18 September 2019 (UTC)

I will email you the user agent I'm using, since it contains contact information that I don't want to post here. Thank you for pointing me to the "Retry-After" header. I am finding that the response does not always include it, however (it's missing ~1/5 of the time). Reason&Squalor (talk) 15:33, 22 September 2019 (UTC)

WDQS lag

Hey dev team, we have way too much WDQS lag again for several days now [1]. User:Smalyshev (WMF) apparently left his position and currently there seems to be a vacany regarding WDQS maintenance.

In order to ensure that the important Query Service does not suffer too much while running unattended, I suggest to make phab:T221774 happen. This should reduce write loads in times of high WDQS lag. Would that be possible? —MisterSynergy (talk) 08:18, 24 September 2019 (UTC)

Thanks for noticing. People at WMF are working on it at the moment, and we hope for a decrease of the lag in the next hours. As for phab:T221774, it's in our pipeline and we will continue working on it soon. Lea Lacroix (WMDE) (talk) 14:44, 24 September 2019 (UTC)

MacBook Pro TrackPad innaccuracy using Google Chrome

Hi. I've found that my trackpad is misses the mark when trying to highlight something in a text box here in Wikidata. I can select text most anywhere with accuracy else in Wikidata or other sites. When in WD, I find the mouse highlights a few letters to the right of what I'm actually trying to highlight. Sometimes highlighting doesn't work. I don't think it's my MacBook, but maybe it is. Trilotat (talk) 14:26, 25 September 2019 (UTC)

Hello, one of my colleagues tried on his own Macbook. He selected content, both in and out of edit mode, without any issue.
Have you tried with a different browser? Lea Lacroix (WMDE) (talk) 14:50, 26 September 2019 (UTC)

IP Range Querieis

I created Wikidata:Property_proposal/IP_range_start, but I'm wondering if there is a technical solution that could be implemented that would prevent the data from having to be duplicated? If so, how would that be implemented? U+1F360 (talk) 15:50, 26 September 2019 (UTC)

Just mentioning here that the discussions continues on the property proposal talk page. Lea Lacroix (WMDE) (talk) 12:14, 30 September 2019 (UTC)

Del borked triples

phab: discussed here: it's seems a simple thing to fix, but I don't think it has been done yet.

When querying, it can give confusing results (multiple triples appear where there should be just one).

Can you look into it? --- Jura 11:26, 29 September 2019 (UTC)

Thanks for letting us know. Could you try editing one of the items, then run the query a few hours later? It may be cached somehow.
If this doesn't solve the problem, we can consider reloading the Query Service with a new dump - I'm not sure how long this would take. Lea Lacroix (WMDE) (talk) 12:31, 30 September 2019 (UTC)
Stas used to have a procedure to re-load individual entities to the Query Servers in case there is something wrong with them. He used it occasionally when edits were missing from WDQS servers for whatever reason. It seems to be the best idea if that script was used here, but we cannot do it as simple Wikidata users. Mind that we cannot do "nulledits" on entities, and finding a possible change for all entities seems not adequate here. --MisterSynergy (talk) 13:15, 30 September 2019 (UTC)
I agree, you can't really expect users to edit tens of thousands of items. Here are some:
SELECT * { { ?st wikibase:timeCalendarModel wd:P1985727 } UNION { ?st wikibase:timeCalendarModel wd:P1985786 }  }
Try it!
Maybe talk to Lucas Werkmeister, supposedly sitting next to you. I think he tracked down much of the issue. --- Jura 13:59, 30 September 2019 (UTC)
SELECT ?st (str(?unit) as ?str_unit) 
{
    ?st wikibase:quantityUnit ?unit .
    FILTER( strstarts( str(?unit), "http://www.wikidata.org/entity/P" ) )
}
Try it!
Here a second query with triples to delete. It doesn't solve it entirely, but deleting these two groups would improve the situation for critical details and might be fairly simple to do.
A point to check might be if the code is really fixed. The reported problem with P7295 is recent: the property was created on 8 September 2019‎ only. --- Jura 17:22, 30 September 2019 (UTC)

Line charts viz weirdness and feature request

Following a request in WD:Request a query to graph the number of members in an organisation

I find weird that a query like this one does not work : It seems really similar in spirit to this line chart documentation example, in spirit, except the datatype of « year » is number instead of string . It seems that a sum is performed over all the lines of the table result for a reason unclear to me. (This make sense to treat it as a number, this is easier to treat missing informations as gaps without having to deal with the problems of, say year 2000, 2010 and 2015 points shown with equal distance if we have only 3 datapoints. )

Which leads to the feature requests, motivated by a query request : it seems that all works fine keeping the date datatype instead of trying to extract the year, but … the intent was to display the year only on the chart, whereas the x-label of the datapoints are displayed as 1-1-year, ignoring the precision of the date. Should this display should not display the month and day if the precision is « year » ? This would avoid the need to extract the year of the date (as a number). author  TomT0m / talk page 21:06, 16 September 2019 (UTC) Now that I think about it, this is a stupid feature request in that case, as the precision is not available with simple RDF values of statements. As this query does use simple values I don’t know how WDQS could be aware of the precision of the value …

  • As a second thought, the feature request would be simply display the year in the X axis label if it seems there is no more than one value per year, or something like that. author  TomT0m / talk page 17:42, 17 September 2019 (UTC)
I think the issue you’re encountering is the long-standing bug T168341 (but perhaps I’m misunderstanding something). --Lucas Werkmeister (WMDE) (talk) 17:35, 18 September 2019 (UTC)
The reason of the weirdness seems to be bound to columns parsing and identifying which ones are X-axis and Y-Axis, which defined to support Number, Label and Datetime per Doc, and which columns define series. For the given query it identifies both org and year as series determiner, breaking up the logic of the graph. It might be fixed by converting year to text, in the same way as an example in the mentioned doc is doing, see the link. It does not necessary has to be defined in the inner query, the conversion in the outer select also working: see fixed query.  – The preceding unsigned comment was added by Igorkim78 (talk • contribs).
@Lucas Werkmeister (WMDE): The problem is that I wanted to avoid doing this conversion, I was aware this would kind of work. But if there is a missing year or several values for a year for some reason in the series, say one per month for just a year, the query become more complicated for no good reason imho : a year gap will result into an irregular spacing of the points (say 1980 is missing, the space between 1979 and 1981 will ). I would fix this, I think I found solutions in the past (querying the items of the years beetween the max and min dates, for example, can help adding a phantom point on the X-axis). Some sample or mean point would have to be chosen if there is several points for a year, or a filter according to date precision or something like that). But it’s trickier and correctly having a date-kind X-axis instead make easier to make the query robust. author  TomT0m / talk page 19:32, 30 September 2019 (UTC)