Wikidata:Contact the development team/Archive/2020/01

From Wikidata
Jump to navigation Jump to search

Hi,

I am a passionate reader and reaching you in context to something like this. I would like to do a contribution to your website by providing a guest post.

The guest post would surely be based on a great topic aligned with your reader’s interests. The content would be of high quality and will be plagiarism-free. However, in return would need a favor of backlink within the main body of the article.

Kindly let me know if I can interest you with some great topic ideas?

Looking forward.

Sincerely,

Umer Isfaq

[edit]

hello, we want to contribute to release soon a working installation of wikibase (composer, not docker) with federation capabilities. Please check the related phabricator task:

https://phabricator.wikimedia.org/T223355

as well as this document

https://docs.google.com/document/d/1YYIuQzcWz2cH9zTUbbfiBrTecVxMWAu3_9-U4DcKzEU/edit#heading=h.z49pje934vur

The "add statement" link/button might be replaced with a dropdown button from which selecting the remote wikibase instance (including, but not limited to wikidata) from where *instantly* retrieving/importing items and properties.

Then in the wikibase data model they have to be identified by a specific and univocal namespace to allow coexistence of items/properties of various wikibase instances, on the same local installation, without conflict. (and for possible future export into wikidata itself)


I'm part of an IT team and I'm a programmer myself so that I can help of course.

Thanks (Thomas) thomas.topway.it at mail.com

Hello,
We're currently refining the first minimum viable product of our federation project. We expect the development to start at the beginning of January. The MVP will only include the ability to reuse Wikidata's properties on external wikibase instances. Lea Lacroix (WMDE) (talk) 15:45, 6 January 2020 (UTC)[reply]

Wikidata dump files

[edit]

Hi everyone! I have some questions about the Wikidata dumps. I posted these to the project chat page, but I didn't get any answers and someone suggested I post them here.

I'm importing data from the JSON dumps into a database, but the dumps do not have redirect information. Because of this, some relationships point to non-existent entities, which is causing data consistency issues. I found Phabricator tasks mentioning that the JSON dumps don't include redirect info, but there is not any recent activity on them. I'm ignoring the missing entities for now, but I need to find a longer-term solution that takes redirects into account.

To work around the issue, I tried parsing the owl:sameAs predicates out of the RDF files and creating an entity for each subject. That created some duplicate entities, which leads me to believe that some of the subject entities in the list of redirects exist in the JSON dump and some do not. This makes sense if the dump files are not snapshots in time, or if they are but the JSON and RDF dumps are run at different times.

To try to understand this better, can someone answer the following for me:

  1. Within a single dump file (JSON and/or RDF), is every entity that appears in a relationship guaranteed to be defined in the file? In other words, does the dump file represent a self-consistent snapshot of the database at a moment in time?
  2. If the above is true, then is it also true across the JSON and RDF files in a given dump directory? Looking through Phabricator, I found some statements in T144103 that imply that a single dump file is consistent, but that different files are not. However, T128876 implies that even a single file may not be consistent.
  3. I looked at various ways to access JSON data on entities, and I get redirect data via the API (e.g. https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q390537) but not via the persistent URI (e.g. https://www.wikidata.org/wiki/Special:EntityData/Q390537.json). Is the latter what is used for the JSON dump files? If so, are there any plans to put redirect info like in the former into the dumps?
  4. Should I be using the RDF dump instead of the JSON dump? I'd prefer JSON because of the availability of standard parsing tools, and because the one-entity-per-line format makes parsing much easier.

If there is a more appropriate place for me to ask these questions, or if you need more information from me, please let me know. Thanks in advance for the help! --Mventimi (talk) 16:47, 5 January 2020 (UTC)[reply]

Hello @Mventimi:, thanks for your questions. I'll need to investigate and ask my colleagues for more detailed answers, but I can already tell you that the answer to your first question is no, unfortunately. Lea Lacroix (WMDE) (talk) 15:47, 6 January 2020 (UTC)[reply]
@Mventimi:
  1. I believe the dumps are not a single snapshot – I’m not sure, but I think they’re only granular at the entity level. (The dumps take several hours to generate, maintaining a consistent view over that time wouldn’t be easy.) But note that Wikibase doesn’t prevent admins from deleting referenced entities anyways, so even in an atomic snapshot you might have references to deleted entities.
  2. I’m pretty sure they’re independent.
  3. For RDF, I think the dumps use the same format as https://www.wikidata.org/wiki/Special:EntityData/Q390537.ttl?flavor=dump. For JSON the “dump” flavor doesn’t seem to make a difference, and what you get in the dumps should™ be more or less the same as for Special:EntityData, except for the missing redirect info. You should be able to detect redirects through the differing "id" field, though (you have the redirect ID in the URL but the target ID in the "id").
  4. I think you can use either, depending on which works better for you.
Hope this helps! --Lucas Werkmeister (WMDE) (talk) 17:04, 6 January 2020 (UTC)[reply]
Please note, though new statements may only be created when they are referred to existing entities, it is possible that a statement uses a property or item that is deleted. This is usually an error (see Wikidata:Database_reports/Deleted_Wikidata_entities_that_are_still_linked), but you should not assume entities in a relationship always exist.--GZWDer (talk) 13:48, 7 January 2020 (UTC)[reply]
Also, JSON is how data in Wikidata stored internally. Wikibase generate RDF from JSON, but not vice versa.--GZWDer (talk) 13:54, 7 January 2020 (UTC)[reply]
Thanks Lea, Lucas and GZWDer for your answers! They clarify a lot for me. It's understandable that the files aren't internally consistent--like you mentioned, you'd have to lock the database for hours, which is impractical. Your confirmation that the individual entries are consistent helps, though.
GZWDer, are you saying that the RDF dumps are generated directly from the JSON dumps? Or that the RDF dumps are generated from the internal JSON representations? Per T94019 I think the former is not true, but you will have better information than I do.
For the redirects, I know that KrBot cleans up statements that link to redirected entities, resolving them to the target entities. In addition, links to deleted entities (as shown above) will eventually be removed. I get that some bad links in the JSON dumps are unavoidable, as there is always a window between redirects/deletes and their clean ups. However, having redirect data in the dumps should resolve a significant portion of these bad links. Pulling the redirect data from the RDFs is working for now, but to avoid downloading both dumps it would be preferable if the redirects were contained in the JSON dump. T98320 is tracking a plan is to create a separate JSON redirects dump, but that issue hasn't been updated in almost 2 years. The good news is that the last comment is by Hoo_man, where he raises similar concerns to mine about consistency. Is this the right place to voice my support for his proposal? I didn't know if it's appropriate to comment in the ticket directly.--Mventimi (talk) 05:54, 11 January 2020 (UTC)[reply]
@Mventimi: To answer the first question – both the JSON and the RDF dumps are generated from the internal JSON representation, which is different from both of them. (For one, the internal storage may contain earlier serialization format versions, but dumps and Special:EntityData will always give you the latest version.) --Lucas Werkmeister (WMDE) (talk) 15:36, 14 January 2020 (UTC)[reply]

issue with property constrains

[edit]

dear friends,

I am trying to add a constrain to a property, in the following way:

start node property constrain inverse of endnode When I added this, the wikidata indicated me that the statement had some isusue, that is start node

was not an instance of wikidata property, or a sub class of an instance of wikidata property Then, I also added the statement:

start node instance of wikidata property, however the issue continues showing up, then my question is why should I be doing wrong?.


Thanks in advanced for any support.


Luis Ramos

Please always give the P/Q number or link to items/properties you are talking about. --SCIdude (talk) 07:39, 14 January 2020 (UTC)[reply]

Wikidata:Database reports/Constraint violations/P225

[edit]

Creating a diff like this gives Fataler Ausnahmefehler des Typs „WMFTimeoutException“ for some month now. Looks like all works fine switching to EN as my UI language as I detected today. --Succu (talk) 21:04, 18 December 2019 (UTC)[reply]

The problem still persists. --Succu (talk) 21:12, 9 January 2020 (UTC)[reply]
@Lea): Any thoughts? --Succu (talk) 20:10, 20 January 2020 (UTC)[reply]
Hello @Succu:, and sorry for the delay. I tried to reproduce the issue, and for me in any language the page is loading forever before rendering an error. We've been looking at it, but it seems unfortunately that the page is just too big to be rendered, and there's nothing we can immediately do about it. Sorry about that! Lea Lacroix (WMDE) (talk) 12:05, 21 January 2020 (UTC)[reply]

Watch property uses

[edit]

As I was discussing with @VIGNERON: here, I have this proposal: as on Wikipedia we have the possibility to have in the watchlist all the pages which are added to a watched category, on Wikidata it would be interesting having the possibility to have in the watchlist all the items where a watched property is added (of course there should both the possibilities, to watch a property and have in the watchlist also the pages where it gets added or to simply watch a property ... otherwise no one would observe properties like VIAF ID (P214)!). Is it possible? Thank you very much as always! --Epìdosis 16:56, 19 January 2020 (UTC)[reply]

It is not included in the watchlist, but would SPARQL RC help with this usecase? This tool from Magnus allows to watch the results of a defined query. This query could be about a specific property. Lea Lacroix (WMDE) (talk) 11:34, 20 January 2020 (UTC)[reply]
@Lea Lacroix (WMDE): SPARQL RC can be very useful, thanks! --Epìdosis 09:59, 24 January 2020 (UTC)[reply]

Search index is no longer updated in real time

[edit]

For several days, when you create a new item or add an alias, it takes several minutes before you could find this item in search. Is it a known issue ? Ayack (talk) 14:51, 23 January 2020 (UTC)[reply]

The servers are quite busy doing compression of the databases at the moment, which probably is slowing things down here, see phab:T232446. Thanks. Mike Peel (talk) 16:22, 23 January 2020 (UTC)[reply]
Also note that search updates are "real time" in the sense that there is a limit to how long it takes to process them, not in the sense that they occur synchronously. That limit is several minutes. In a lot of cases, updates are processed before the deadline, but it is normal for updates to take time in transit. Delay can occur because of wait time on MySQL slave replication, or a large queue of updates to process, and lastly, Elasticsearch itself always has a delay before a write is committed and available in searches. GLederrey (WMF) (talk) 10:17, 24 January 2020 (UTC)[reply]

Watchlist, usage tracking and descriptions on mobile phone

[edit]

The initial discussion on frwiki : fr:Wikipédia:Le_Bistro/26_janvier_2020#TA_GRAND_MERE_SE_PROPAGE_AU_CANADA @Bob Saint Clar:

It seems that a description vandalism is not tracked by the usage tracking although descriptions are used as subtitles on mobile phone. So the vandalism does not shows up in Wikipedia watchlist with Wikidata activated. Should it be corrected ? author  TomT0m / talk page 17:25, 26 January 2020 (UTC)[reply]

Merci pour l'info. Indeed, edits on descriptions are not included in the Wikipedia watchlist yet. I'll see how we can make this move forward in the near future but I can't promise anything. Lea Lacroix (WMDE) (talk) 09:41, 27 January 2020 (UTC)[reply]

Scaling WDQS and WikiData

[edit]

I am working on Grant proposal for scaling wikidata, wdqs and try to address some of the goals that were stated in the strategy 2030. Please visit https://meta.wikimedia.org/wiki/Grants:Project/WDQS_On_FoundationDB . The title will prolly change. --i⋅am⋅amz3 (talk) 23:36, 26 January 2020 (UTC)[reply]

Implement export/archive/move to other Wikibase instance feature

[edit]

As the title says.

Ogoorcs (talk) 23:04, 29 January 2020 (UTC)[reply]

Hello @Ogoorcs:, this feature is not planned in the Wikibase roadmap for now. Feel free to create a task on Phabricator, describing your usecase and how you would see such a feature work. Lea Lacroix (WMDE) (talk) 09:03, 30 January 2020 (UTC)[reply]

Wikimedia Cloud services project connecting to Wikidata

[edit]

We' would like to have Wikispore (Q67605965), a community project hosted on Wikimedia Cloud Services, be connected to Wikidata. We would like to use Wikidata values at Wikispore to populate infoboxes, etc, on the cloud-hosted project. A one-way connection would be sufficient for our current purposes.--Pharos (talk) 03:25, 30 January 2020 (UTC)[reply]

Hello @Pharos:, unfortunately, at the moment arbitrary access (via parser functions or Lua) from projects hosted on Wikimedia Cloud Services is not possible. For technical reasons, this feature is only accessible to the wiki hosted on the same server cluster and accessible the same database as Wikidata - like the official Wikimedia projects. I'm afraid you'll have to wait until Wikispore becomes part of this group.
Other solutions may be possible, like running scripts that include the values in your wiki and update them if necessary. Lea Lacroix (WMDE) (talk) 09:01, 30 January 2020 (UTC)[reply]