Wikidata talk:WikiProject Schemas
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 7 days. For the archive overview, see Archive/. The latest archive is located at Archive/2024.
|
Meeting Notes
[edit]14 Feb 2018
[edit]ericP: what is our mission Andra: to raise awareness and grow a community ... we can grow the WikiProject page over time ... use cases for ShEx ... namespace for ShEx- Lucas' idea from WikidataCon, should host shapes on their own URI, what would that URI be?
ericP: conflicting intrests- give ownership to WD community, conflicting one is to have them visible so people can copy and steal shapes even if they are outside wd community ... maybe move shemas over to WD and then mirror to shex schemas space when you want
Lucas: To create a new namespace- prob not trivial. Even if no one argues, the technical side might be complicated.
Andra: maybe postpone this until we have more use cases. ... Kat's http://wikidp.org/ demo- can we use this for additonal domains by driving it with shex- property checklist driven by shex ... We could create a generic version of the portal, containerize it and then poeple could slot in their own shape expressions to create their own property checklists ... need shapes avail through URL to reuse Harold Solbrig's pyshex, so that is why i need a namespace for shape URIs
ericP: demo manifests to run in Eric's or Jose's implementation- like the primer try it links- 1. create manifests so ... good queries and validation tests either that are picked up remotely, or static data, create the schemas that will be shared, demo data, and manifests in a picklist ... demos show why validation is useful, hints on how is used in different domains, give people ideas ... wiki page with try it links, if we have a data structure, we can express it like this, that catches errors like this, help people
Andra: create a page similar to the example queries
TODOs for next meeting:
Lucas- ask around WMDE about how to request a new namespace Kat- create an example on the WikiProject page Andra- create an example on the WikiProject page Kat- paste notes in the talk page of the WikiProject ? Create phabricator ticket for a new namespace?
Examples and tools
[edit]WikiProject ShEx has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. Could you please first provide examples of ShEx shapes that check particular data models in Wikidata and guidelines how to check Wikidata against this shapes? I'd prefer
- a web form tailored to Wikidata to edit and check shape expressions with syntax highlighting and typeahead, such as Wikidata query service
- a bot that regularly runs ShEx given at Wiki pages and posts the results, such as User:ListeriaBot
-- JakobVoss (talk) 07:15, 22 February 2018 (UTC)
- I've just added a "Tutorials and examples" section on the project homepage, with a very basic example on how to get started with ShEx2. Please help improving! (Thanks to Eric for fixing two minor issues in ShEx2!) Jneubert (talk) 12:06, 25 June 2018 (UTC)
- Updated version of How to get started with ShEx on Wikidata? - please help improving. --Jneubert (talk) 14:35, 25 July 2019 (UTC)
Wikidata ShEx Inference tool
[edit]WikiProject ShEx has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
Hi folks! I’ve been working on a tool to automatically infer ShEx schemas from Wikidata items, and a first version of the tool is now available at toolforge:wd-shex-infer (documentation). I would be very thankful if you could try it out and let me know how it works for you, preferably within the next two weeks (the tool will stay available after that, but eventually I’ll have to write and hand in my thesis). Let me know if you have any questions! --Lucas Werkmeister (talk) 12:33, 16 August 2018 (UTC)
- Some initial observations: This tool is a great idea and could potentially become very useful — thanks! It's understandable that only a small number of jobs can be run at any time, but it would be nice to be able to submit jobs into a queue if they cannot be run immediately. The tool tips when exploring the ShEx results are helpful. I haven't seen references covered in the ShEx output, but it would be handy to be able to run some jobs specifically to explore the data model used for references on items of particular types. --Daniel Mietchen (talk) 02:29, 18 August 2018 (UTC)
- @Daniel Mietchen: thanks! I’ll think about adding a job queue, depending on how many people use the tool. And currently, qualifiers and references are ignored, yes – I’m afraid that the way RDF2Graph works doesn’t really work well with them (it heavily relies on “instance of” and “subclass of” relations, so it would see all statement and reference nodes as equivalent, since they all have the type
wikibase:Statement
/wikibase:Reference
). It might be possible to fix that, but I don’t think I’ll have time for that before my thesis is done. --Lucas Werkmeister (talk) 12:14, 22 August 2018 (UTC)
- @Daniel Mietchen: thanks! I’ll think about adding a job queue, depending on how many people use the tool. And currently, qualifiers and references are ignored, yes – I’m afraid that the way RDF2Graph works doesn’t really work well with them (it heavily relies on “instance of” and “subclass of” relations, so it would see all statement and reference nodes as equivalent, since they all have the type
- Friendly reminder that the next few days would be an especially helpful time for feedback :) it should also be possible to run two jobs at once now. Please let me know if there are any problems! --Lucas Werkmeister (talk) 17:56, 28 August 2018 (UTC)
- I’ve also updated the tool to fix several problems with the simplification step, so now the schemas should look much nicer. For example, compare the shape for human (Q5) between job #11 and job #29 (both for “films that won ten or more Oscars”): five target classes for nominated for (P1411) were merged into one (award (Q618779)), as were nine target classes for award received (P166); eight target classes for country of citizenship (P27) were merged into two (political territorial entity (Q1048835) and political system (Q28108) – that second one is probably a bug in the data); and so on. You might even see completely new predicates be mentioned, because the tool drops any predicate with more than ten possible target classes (rationale: that’s pointless noise), so predicates which would previously have been dropped might now be included due to the target classes being merged. If you were dissatisfied with the schemas before, perhaps take another look? :) --Lucas Werkmeister (talk) 15:49, 6 September 2018 (UTC)
You can now try Shape Expressions on a test system
[edit]WikiProject ShEx has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
Hello all,
The Wikidata team started working on support for Schemas, specifically Shape Expressions, to integrate a new extension into Wikidata, in order to store and reuse Schemas.
It’s still in development, but we wanted to share the first results with you, so you can give us early feedback.
On the test system, one can create and edit Schemas. You can see an example Schema here.
Please note that the multilingual labels, descriptions and aliases are not enabled for now, this is the next step we will work on. After that we will work on linking to a tool that allows you to check the Schema against a list of Items.
If you have any questions or remarks at that stage, please let me know by replying to this section :) If you want to create Phabricator tickets, you can use the tag Shape Expressions.
Cheers, Lea Lacroix (WMDE) (talk) 14:13, 26 February 2019 (UTC)
- Thanks for letting us know. I just tried it out and created O10. YULdigitalpreservation (talk) 19:29, 26 February 2019 (UTC)
- Sorry for the delayed reply. This is cool! I finally got around to it, but will put my (few) ShEx there. --Egon Willighagen (talk) 09:50, 22 April 2019 (UTC)
Improvements on ShEx test system
[edit]WikiProject ShEx has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
Hello all,
Our developers keep working hard on Shape Expressions, and we would love to have your feedback on the current version :)
Here's what has been improved recently:
- the "termbox" area of the page now displays several languages
- if you switch your interface from English to a language that has label filled, the title of the page will change accordingly
- if you want to add a label/description in a new languages, two options are possible: you can switch your interface in this new language, and an editable line will appear in the table, or you can edit directly the URL to access the special page, eg.
https://wikidata-shex.wmflabs.org/wiki/Special:SetSchemaLabelDescriptionAliases/O2/fr
- there is no more edit button on the top of the page, but the different sections are independantly editable
- A new special page,
Special:SchemaText
, provides the raw text of the Schema in an external file. Example:https://wikidata-shex.wmflabs.org/wiki/Special:SchemaText/O2
And here's what is coming next:
- the "edit" buttons will be translated in the language of your interface
- we will add a button to check the schema in the validator tool
Feel free to try the interface on the test system, create new schemas, play around. If you find any issue, or if there is a feature/improvement that you would like to add, please let me know :)
Cheers, Lea Lacroix (WMDE) (talk) 09:14, 14 March 2019 (UTC)
- One thing that comes to mind is to be able to indicate what items the expression should run on. At this moment I am not entirely sure how to 'run' my ShEx on Wikidata. --Egon Willighagen (talk) 09:51, 22 April 2019 (UTC)
Please explain
[edit]What is the purpose and how will it affect the existing structure that is opaque. That can not be explained to me (I have asked repeatedly). What are the material benefits of this approach? Thanks, GerardM (talk) 16:45, 16 March 2019 (UTC)
Community requirements for data integrity
[edit]@GerardM: Hi, for completeness and to make sure we're addressing your issues, could you link to your previous requests for explanation?
While not all Wikidata communities require or even desire validation, it is essential for some of the more complex ones, e.g. GeneWiki (c.f. GeneWiki grant proposal). Such validation can be hand-rolled, but having a standard schema language offers obvious advantages in terms of tooling, completeness and ease of maintenance. Compiling even a simple ShEx schema to SPARQL produces a 10-100x explosion in line noise and scripting something with conjunction of JSON path expressions would require tooling investment and would require maintenance of a corpus of rules to enforce cardinality, data type consistency and structural coherence. It would be possible to invent a Wikidata-specific schema language but it would lack the tooling support that ShEx offers (validators in five languages, form-generation, import from UML/XMI, etc).
I've witnessed many publicly-curated databases lose relevance as their data rotted over time or changed structure so that potential users gave up trying to track it. Open PHACTS was founded specifically to provide integrity and consistency to Linked Data. Domain-specific databases typically have greater institutional investment because they offer integrity and consistency backed by schemas (e.g. UniProt, whose RDF structure reflects a conventional SQL (DDL) schema for genes and proteins). General knowledge stores have to add schema validation because their native schema is not domain-specific but instead one of generalized assertions, which can express incoherent data structures as easily as coherent ones.
Of course not all communities demand validation, but I believe that the offer of testable contracts to ensure the longevity and institutional investment in Wikidata more than justifies this effort.
--EricP (talk) 07:00, 18 March 2019 (UTC)
- When technology is introduced that enforces particular behaviour, it is all too easy to use the same technology elsewhere when at first glance a similar situation exists. So you have been abstract in your answer and it does not satisfy. I am familiair with SwissProt/UniProt from my Wikiprotein days. I know that Wikidata is not as good as Wikiprotein used to be. The quality of the data is not the issue, the issue is that a schema enforces. It follows that a certain "completeness" will be enforced and that is not necessarily a good thing. What I learned at Wikiprotein is how vital it is that people include information that is valid but not necessary complete.
- In conclusion, what is it EXACTLY what you aim to achieve/enforce? Thanks, GerardM (talk) 11:11, 18 March 2019 (UTC)
- ShEx or any schema language is not about enforcing, it is more instrumental to checking for conformance. As a data-consumer I want to be able to check data consistency according to relevant data-models. Relevant to me, not necessarily to you. There are many case where even within a single application multiple schema's could apply, depending on the use case. As you say it is crucial that people include data that is valid, not necessarily complete. There is no intention to enforce, only to be able to check the validity. --Andrawaag (talk) 11:40, 18 March 2019 (UTC)
- You asked EXACTLY what we aim to enforce. It would be tedious to enumerate everything but as an example, in Gene Wiki we want to know when an item on a protein doesn't have properties related to genes (e.g. chromosomal location) AND that a genomic build is missing as a qualifier to the statement on the gene location, making the statement non-sensical. When these inconsistencies occur having flags indicating these inconstancies being part of the workflow, tremendously helps in curating protein and gene information. Early prototypes of this system have already help me fixing errors. --Andrawaag (talk) 12:10, 18 March 2019 (UTC)
- That makes perfect sense. So in conclusion the intention is to signal structural issues in order to help people insert sensible information and to use it as a template to query those records that fail a "sanity"check. Thanks, GerardM (talk) 15:16, 18 March 2019 (UTC)
Update documentation
[edit]Hello dear ShEx enthusiasts!
Because we will release Schemas on Wikidata very soon, I'm currently reviewing the existing documentation. When I announce it, I expect a lot of people in the Wikidata community to wonder "what is it exactly? how can write my own?"
The main links I'll redirect people to is your Wikiproject page and Wikidata:WikiProject ShEx/How to get started?. Is this second page still up to date from your point of view?
I think that now would be a good time to give a bit of polish to the presentation of shape expressions. From the development team side, will add technical documentation about the new extension and data type.
If you have any question or wish, feel free to ping me. Cheers, Lea Lacroix (WMDE) (talk) 15:11, 23 April 2019 (UTC)
Shape Expressions arrive on Wikidata on May 28th
[edit]See full announcement on the Project Chat :)
Thanks a lot to all of you who have been involved in discussing, suggesting improvements, testing the feature! Lea Lacroix (WMDE) (talk) 13:30, 19 May 2019 (UTC) WikiProject ShEx has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
- Hello all,
- As announced here, we just released shape expressions on Wikidata. You can for example have a look at E10, the shape for human, or create a new EntitySchema.
- A few useful links:
- introduction to ShEx
- more details about the language
- More information about how to create a Schema
- Phabricator tag: shape-expressions
- User script to highlight items and properties in the schema code and turn the IDs into links
- If you have any question or encounter issues, feel free to ping me. Cheers, Lea Lacroix (WMDE) (talk) 16:07, 28 May 2019 (UTC)
- Property proposed: Shape Expression for class - Jheald (talk) 16:58, 28 May 2019 (UTC)
- User:Lea Lacroix (WMDE), can we please quickly add the EntitySchema namespace in the license footer? I guess that namespace is CC0 as well, right? —MisterSynergy (talk) 18:41, 28 May 2019 (UTC)
- Indeed it's CC0. Thanks for the reminder! I created a ticket. Lea Lacroix (WMDE) (talk) 07:18, 29 May 2019 (UTC)
Are the following validations possible?
[edit]1. Ensure that at least one statement for a given property (where multiple statements exist) has a value in a specified value set. If other statements exist for the property, ignore them. For example, validate that an item has at least the statement instance of (P31) sovereign state (Q3624078), but may also have other instance of (P31) statements that should be ignored. --Dhx1 (talk) 18:35, 28 May 2019 (UTC)
- 1. Yes, the keyword EXTRA says that other values of the property may appear. This is common for P31. This example shows a schema with a simple value set [<Qx> <Qy>]. (In many schemas, that's a value set of 1 element.) <Q2> fails <WithoutExtra> because it has an extra P31 (outside the value set).but it passes <WithExtra>. I added a <Q3> which has two P31's within the value set. There you don't need an EXTRA, you need instead to increase the number of expected P31s matching the value set. I added + which is a shorthand for {1,}, i.e min number of 1, max number unlimited. --EricP (talk)
2. Extract data on linked Wikidata items using EXTERNAL (?) or some other technique, allowing a country (P17) statement to be validated to ensure the linked item has a statement instance of (P31) sovereign state (Q3624078).
- 2. Yes, but you don't need EXTERNAL. If I understand the question, you just want your constraints to link to another resource in the wikidata world. I created a shape for national flags as an example. It has the constraint below (which 90% of flags fail, but...) to say that the NationalFlag must have a country with a given type. --EricP (talk)
wdt:P17 { wdt:P31 [wd:Q3624078] }
- @EricP: is it possible to go a step further than that, and say that the linked country from a given flag is not only of a certain type, but also itself has a flag (P163) of this same item? --Oravrattas (talk) 06:35, 8 July 2020 (UTC)
- at present, no, though there is a proposal to directly compare the value of some TripleConstraint against a property path, which is relatively simple to implement, and another to add more generic functions (example), which is more powerful but more complex. Aside from picking between the two, we also have to decide if we want to break the locality features of ShEx to add either one of them. --EricP (talk)
Validate in Blazegraph/query server ?
[edit]It would be interesting if these schemes could be used directly on query server, i.e. filter for items that match, check if items match, list errors. --- Jura 10:38, 29 May 2019 (UTC)
Running validation with API access (i.e. getStatements()) would greatly accelerate validation and reduce parsing and serialization effort on the query server. ---EricP (talk)
Structure e-entities ?
[edit]There are a few essential, but secondary elements sometimes included on entities:
- queries of items that could be validated
- lists of prefixes
I think the first could easily go into the long announced "query"-namespace. The second could probably be assumed in the configuration of whatever tool one uses, at least if they are WD prefixes. --- Jura 10:38, 29 May 2019 (UTC)
- Associating an item with each shape could help link the queries. --- Jura 09:36, 30 May 2019 (UTC)
links between entities schema
[edit]I couldn't figure out the way to refer from an entity schema to another: for instance, I would like to be able to write from E36 entry point something like wdt:P629 @<someprefix:E35>
Is that possible? Is is the right pattern to have several EntitySchema to describe different shapes of a schema? pinging the ShExperts ;) @Andrawaag, YULdigitalpreservation, Jelabra, Tombakerii: -- Maxlath (talk) 15:35, 29 May 2019 (UTC)
- I am definitely a ShEx beginner as well, but I have found the
import
command as described in [1] and [2] which looks promising. You can access the raw ShEx schema code via Special:EntitySchemaText (e.g. Special:EntitySchemaText/E10).
Unfortunately, I don't get it to work in the shex-simple tool, and I am not sure whether this is due to my poor ShEx skills, or some bug in the tool (error message is: "failed to create validator: loadImports@https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/browser/shex-webapp-webpack.js:53845:9 …"). —MisterSynergy (talk) 09:31, 30 May 2019 (UTC)
- I'll dive into this. @MisterSynergy, can you pass me an experiment that failed and I'll see if I can tweak it to make it succeed? (One requirement is of IMPORT <XXX> is that XXX returns the schema without any HTML around it; also that we don't get defeated by CORS issues which require administration beyond my fingertips. — EricP (talk)
- For instance this one (sorry for the non-clickable link, there are several unmasked characters which I don't want to change in order not to break the link):
https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/doc/shex-simple.html?schema=PREFIX%20rdf%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX%20%3A%20%3Chttps%3A%2F%2Fwww.example.org%2F%23%3E%0A%0Aimport%20%3Chttps%3A%2F%2Fwww.wikidata.org%2Fwiki%2FSpecial%3AEntitySchemaText%2FE48%3E%0Astart%20%3D%20%40%3Asportsperson%0A%0A%3Asportsperson%20EXTRA%20wdt%3AP106%20{%0A%20%20wdt%3AP106%20[%20wd%3AQ2066131%20]%3B%0A%23%20wdt%3AP22%20%40%3Chuman%3E%0A}&data=Endpoint%3A%20https%3A%2F%2Fquery.wikidata.org%2Fsparql&shape-map=SPARQL%20%27%27%27SELECT%20DISTINCT%20%3Fid%20WHERE%20{%20%3Fid%20wdt%3AP106%20wd%3AQ2066131%3B%20wdt%3AP22%20[]%20}%20LIMIT%2010%27%27%27@START&interface=human®expEngine=threaded-val-nerr
- It uses EntitySchema:E48 via Special:EntitySchemaText/E48 (raw shex without any HTML around—just click on it). I already tried several things, including this older version of E48 with prefixes. Note that E48 does not have a "start" command, as required for imported shape expressions. In the simple-shex tool, you'll see that the line that would actually make use of the imported shex is commented because it does not work anyways.
- The error message displayed in Google Chrome is
failed to create validator TypeError: Cannot read property 'keepImports' of undefined at loadImports (https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/browser/shex-webapp-webpack.js:53845:23)
. Sounds like a Javascript issue, but I am not very experienced with that… Thanks for investigating, —MisterSynergy (talk) 15:28, 30 May 2019 (UTC)
- For instance this one (sorry for the non-clickable link, there are several unmasked characters which I don't want to change in order not to break the link):
- Import prefixes from EntitySchema:E49 would be nice too. --- Jura 09:41, 30 May 2019 (UTC)
- One engineering decision is whether that import would be just textual, like C's *#include*, or whether the prefixes (and inclusion there-of) should appear in the JSON (ShExJ) and RDF (ShExR) versions of the schema. You may want to raise a language issue with the tag "enhancement". — EricP (talk)
What to do with duplicate schemas?
[edit]WikiProject ShEx has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
Hi all, since people are already working to do their own schemas, and since we still didn't set up a list of all existing ones, there are already a couple of them who are basically the same thing, like E10, E14 and E48. What do we do in this case? Do we cancel them or "reuse" them? --Sannita - not just another it.wiki sysop 14:32, 11 June 2019 (UTC)
- Hello,
- Not directly answering your question, I just wanted to point to a few tickets - we will continue improving the software in the future.
- Cheers, Lea Lacroix (WMDE) (talk) 08:58, 13 June 2019 (UTC)
- Some more input: I do not think that we should be concerned about duplicates at this point. ShEx is a relatively new functionality and there is quite a lot of dev work going on, as well as the community needs to become familiar with it. According to [3], there are not that many EntitySchemas created until now. Later, we probably want to either merge duplicates (i.e. redirect the E-numbers), or simply allow "duplicated" EntitySchemas. Reuse does not seem to be a good idea, though. --MisterSynergy (talk) 09:24, 13 June 2019 (UTC)
CheckShex UserScript
[edit]I thought this project might be interested in a new userscript named CheckShex. It adds a field to items, properties, lexemes where you can enter an entitySchema and it will return whether it passes or fails. It also adds a field to entitySchemas, where you can do the reverse. The userscript can be installed to your common.js from User:Teester/CheckShex.js.
The userscript is backed by an api based on PyShEx (Q51672520). The api is located at https://tools.wmflabs.org/pyshexy/api and details about its use are at https://tools-static.wmflabs.org/pyshexy/. Teester (talk) 11:56, 22 June 2019 (UTC)
- Thanks for this great tool! Sometimes however, I get strange results: Checking Antifaschistisches Pressearchiv und Bildungszentrum Berlin (Q575202) against "E94", I get "Pass Fail" as message. When I hit "Check" again, I get "Fail". This behaviour seems not to be reproducible, but I encountered it once for 20th Century Press Archives (Q36948990), too. A hint may be that hitting "Check" again on an item page after "Pass" always results in "Fail". From E94, both items are validated consistently as passing. Jneubert (talk) 09:55, 20 July 2019 (UTC)
- Thanks. There was a bug in the userscript where when you hit check more than once the schema would be checked against itself rather than the item being checked against the schema. I wonder if the "Pass, Fail"" behaviour is from clicking "Check" a second time before the check is complete and running into the bug?
- Looking at the items, Antifaschistisches Pressearchiv und Bildungszentrum Berlin (Q575202) currently fails against E94 because of a missing parent organization (P749), while 20th Century Press Archives (Q36948990) currently passes. I get this result when using both the user script and the ShEx2 Validator. For the ShEx2 Validator, a query like this gets you just that item to validate: Try it!
SELECT ?item WHERE {BIND(wd:Q36948990 as ?item)} LIMIT 1
- A simpler way: --Vladimir Alexiev (talk) 09:00, 9 January 2020 (UTC)Try it!
SELECT (wd:Q36948990 as ?item) WHERE {}
- A simpler way:
- Let me know if there are any other bugs or problems. Teester (talk) 14:21, 20 July 2019 (UTC)
- A big sorry - I'm currently figuring out possible workflows, and indeed have made E94 more strict, which causes it to fail with Antifaschistisches Pressearchiv und Bildungszentrum Berlin (Q575202), while it passes the new relaxed E95. This messed up the test case - sorry again!
- Now in multiple tests with some arbitrary clicking, I was not able to reproduce a case with "Pass Fail", so I suppose this is gone together with your bug fix, which also worked consistently well. Thank you for the quick fix! --Jneubert (talk) 08:01, 21 July 2019 (UTC)
- May I suggest a possible extension of the script? The API already returns the reason for failing (e.g., [4]). So it should be possible to show it to the user on request (with a popup/mouse-over perhaps, because the messages do not look nice, but are helpful nonetheless). --Jneubert (talk) 08:10, 21 July 2019 (UTC)
- Great idea. I've updated the user script so that now it shows some error information on failure. Now, if there's a missing or incorrect property in the response, the property number is shown beside the Fail message. Additionally, the raw error response is available on mouse over of the fail message. Teester (talk) 11:03, 23 July 2019 (UTC)
- This is fantastic - thank you so much. --Jneubert (talk) 14:37, 23 July 2019 (UTC)
- Great idea. I've updated the user script so that now it shows some error information on failure. Now, if there's a missing or incorrect property in the response, the property number is shown beside the Fail message. Additionally, the raw error response is available on mouse over of the fail message. Teester (talk) 11:03, 23 July 2019 (UTC)
- While adding the tool to the How to get started ... page, another possible improvement came to mind: On the item page, a tiny "schema" link, right of the validating result, would make it super-easy to navigate to the selected schema. --Jneubert (talk) 17:57, 23 July 2019 (UTC)
- I added more suggestions at User_talk:Teester/CheckShex.js#Usability --~~
Add saved queries to EntitySchema entries?
[edit]The "check entities against this Schema" link on the schema pages is a great thing. However, it requires newbies and experts alike to write a query from scratch, which is tedious. Some Schema authors are working around this by embedding example query code in the schema text as comment - which helps, but looks a bit messy, and still needs manual copy+paste for transfer to the query field.
So it would be great if we could save a query - or even better, muliple named queries - with the schema. The code to load queries and allow for user selection is already in place (see ShEx2 on Toolforge) with the "dataLabel" and "queryMap" parameters in the manifest file (though perhaps not yet as http request query parameter).
On the Wikidata/Wikibase side, I wonder if setting the Wikidata SPARQL query equivalent (P3921) property could be enabled for EntitySchema entries. Together with subject named as (P1810) qualifiers, that would allow for multiple queries to be saved with each schema. --Jneubert (talk) 09:31, 21 July 2019 (UTC)
- You could add statements to items like Q64335281. I don't think Wikidata SPARQL query equivalent (P3921) is suitable for that though. You might want to re-read its definition. --- Jura 09:34, 21 July 2019 (UTC)
- My idea was to re-use the property definition at EntitySchema:E123 in order to use it there directly (formatted as a link to ShEx2), not to add the property to an item about the schema. --Jneubert (talk) 09:46, 21 July 2019 (UTC)
- This is currently not supported. Once an item is associated with a schema, you should be able to load its content on the schema page with LUA. --- Jura 10:09, 21 July 2019 (UTC)
- My idea was to re-use the property definition at EntitySchema:E123 in order to use it there directly (formatted as a link to ShEx2), not to add the property to an item about the schema. --Jneubert (talk) 09:46, 21 July 2019 (UTC)
Have there been improvements in this regard? Having a formal association of shape with query is essential for example for a reporting bot. --SCIdude (talk) 14:28, 10 November 2020 (UTC)
I think there is a solution if you have the shape in RDF, see https://stackoverflow.com/questions/65618009/rdf-namespace-that-can-describe-sparql-queries. It uses these namespaces:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix shex: <http://www.w3.org/ns/shex#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix spin: <http://spinrdf.org/sp> . <http://XYZ> a shex:Shape ; ... rdf:type skos:Concept. <http://someuri> a spin:Select ; spin:text "SELECT ?item WHERE { ?item wdt:P31 xyz }"; rdf:type skos:Concept. <http://someuri> skos:related <http://XYZ>.
--SCIdude (talk) 08:56, 8 January 2021 (UTC)
Comparison between ShEx and constraints
[edit]What’s possible with ShEx that is not with constraints and vice versa ? What I got so far is:
- Constraints are tight to a property, shapes are « free » to be checked against any item and reused
- Constraints are somewhat easier to edit textually, more efficient
- Constraints are automatically checked by Mediawiki.
- Shapes are more powerful, for example it’s possible to express something like any property that is not authorized is impossible
Anything else/wrong ? It’s unclear to me how type shape constraints can be dealt with on Wikidata, as « rdf:type » is irrelevant in Wikidata items. Wikibase has domain and range constraints, I’m not sure this can be dealt with with shape expressions as it seems there is no notion analog to Sparql PropertyPath’s in shex. author TomT0m / talk page 19:42, 21 July 2019 (UTC)
- Correction, it’s definitely possible to express paths, my bad (this is used on the example shape for file formats, and for example showed in the 13th slide of this comparison between shex and shacl). author TomT0m / talk page
Could we generate constraint from shapes and vice-versa ?
author TomT0m / talk page 19:42, 21 July 2019 (UTC)
Lack of help
[edit]There is also a lack of help. No mention of schemas in the Help namespace so far. There should be Help:Schemas just like Help:Constraints. -- JakobVoss (talk) 08:48, 25 October 2019 (UTC)
- There is Wikidata:WikiProject ShEx/How to get started? as the only subpage of this project at this point, and it clearly is "work in progress". We definitely need to collect some more experience with Schemas in Wikidata in order to come up with a helpful help page. —MisterSynergy (talk) 09:10, 25 October 2019 (UTC)
- In my opinion the technical references, links to standards and implementations should be removed. For an overview about SheX in general there is en:ShEx. This page in contrast should focus of use of SheX in/for Wikidata. -- JakobVoss (talk) 12:46, 26 October 2019 (UTC)
Request a Schema page
[edit]Schemas are still hard for people for various reasons. We had the same problem with queries and one thing that beautifully helped was the Request a Query page. There anyone who doesn't know how to write sparql can ask for help from people who can. I think a similar Request a Schema page could be super helpful to get more wiki projects to adopt Schemas. Thoughts? --Lydia Pintscher (WMDE) (talk) 13:27, 31 October 2019 (UTC)
- Support Currently, there are only around 140 entity schemas. This number may be possibly improved with the creation of a dedicated page for schema related questions. John Samuel (talk) 13:32, 31 October 2019 (UTC)
- Support Fabulous idea. Do we have people willing to build schemas on request? - PKM (talk) 03:13, 15 November 2019 (UTC)
- Support Request a query is super-helpful, and an equivalent for schemas would be great. --Oravrattas (talk) 06:43, 8 July 2020 (UTC)
Human readable schemas
[edit]One of the biggest problems with schemas right now is that they are difficult to understand without sufficient technical knowledge. But it seems to me that it should be possible to translate a schema into human readable language without too much difficulty, for the most part.
For example, if my understanding of shex is correct, currently E10 could be translated as follows:
Schema | Translation |
---|---|
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> start = @<human> <human> EXTRA wdt:P31 { wdt:P31 [wd:Q5]; wdt:P21 [wd:Q6581097 wd:Q6581072 wd:Q1097630 wd:Q1052281 wd:Q2449503 wd:Q48270]?; # gender wdt:P19 .; # place of birth wdt:P569 . + ; # date of birth wdt:P735 . * ; # given name wdt:P734 . * ; # family name wdt:P106 . * ; # occupation wdt:P27 @<country> *; # country of citizenship rdfs:label rdf:langString+; } <country> EXTRA wdt:P31 { wdt:P31 [wd:Q6256 wd:Q3024240 wd:Q3624078] +; } |
|
I could see this sort of thing being useful as part of a schema's talk page, similar to how property's talk pages contain a template containing useful information about a property and its constraints. Does anyone know of a service which will translate a schema into human readable language or vice versa? Teester (talk) 13:46, 16 November 2019 (UTC)
- Since there seems to be nothing that can translate schemas into human readable language, I've put something together at https://tools-static.wmflabs.org/shextranslator/ Any feedback would be appreciated. Teester (talk) 12:23, 23 November 2019 (UTC)
- Schemas have great potential to be come a good tool, but, in its present implementation, I don't think we can or should expect from users to rely on them as a primary mean of understanding which properties to add or what statements to fix.
- A human readable version should always be outlined on a WikiProject page or with property constraints. --- Jura 12:42, 23 November 2019 (UTC)
PyShexy and sparql query
[edit]https://tools-static.wmflabs.org/pyshexy/ Have anyone figured out a way to get it to work with a sparql query? I tried hard but failed, I get HTTP 500 error. Example: query, pyshexy url--So9q (talk) 23:33, 25 November 2019 (UTC)
- https://tools.wmflabs.org/pyshexy/api?entityschema=E15&sparql=select%20*%20WHERE{%3Fitem%20dct%3Alanguage%20wd%3AQ9035.}limit%2020 —MisterSynergy (talk) 23:51, 25 November 2019 (UTC)
Troubles
[edit]WikiProject Schemas has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
I'm a big fan of shapes, extensively reviewed the "Validating RDF" book, try to use them in my work, and we at Onto are helping with the rdf4j effort (though that's SHACL not SHEX). I'm quite enthusiastic about the Wikidata ShEx project and see a lot of good things.
But I tried to validate a realistic list, eg BG painters (this selects 100 of 310 on WD) against E10:
select ?item {?item wdt:P106 wd:Q1028181; wdt:P27 wd:Q219} limit 100
and I think the results are not quite usable yet.
PyShexy
[edit]PyShexy just gave up on me, even with limit 1 it returns "The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application."
shex.js
[edit]ShEx.js behaves better (paste the query in the box).
But there are still usability problems:
- Some of the errors are reported many times, eg (I cut to only the first few) cc @EricP::
wd:Q284264@!START validating http://www.wikidata.org/entity/Q284264 as //www.wikidata.org/wiki/Special:EntitySchemaText/human: validating http://www.wikidata.org/entity/Q12287013: Missing property: http://www.wikidata.org/prop/direct/P19 Missing property: http://www.wikidata.org/prop/direct/P19 Missing property: http://www.wikidata.org/prop/direct/P19 wd:Q6957611@!START validating http://www.wikidata.org/entity/Q6957611 as //www.wikidata.org/wiki/Special:EntitySchemaText/human: Missing property: http://www.wikidata.org/prop/direct/P19 OR Missing property: http://www.wikidata.org/prop/direct/P569 OR Missing property: http://www.wikidata.org/prop/direct/P19 OR Missing property: http://www.wikidata.org/prop/direct/P569 OR Missing property: http://www.wikidata.org/prop/direct/P19 wd:Q11317581@!START validating http://www.wikidata.org/entity/Q11317581 as //www.wikidata.org/wiki/Special:EntitySchemaText/human: Missing property: http://www.wikidata.org/prop/direct/P19 OR Missing property: http://www.wikidata.org/prop/direct/P19 OR Missing property: http://www.wikidata.org/prop/direct/P19 OR Missing property: http://www.wikidata.org/prop/direct/P19 OR Missing property: http://www.wikidata.org/prop/direct/P19 OR Missing property: http://www.wikidata.org/prop/direct/P19 wd:Q12283051@!START validating http://www.wikidata.org/entity/Q12283051 as //www.wikidata.org/wiki/Special:EntitySchemaText/human: validating http://www.wikidata.org/entity/Q12299788: validating http://www.wikidata.org/entity/Q12283051: validating http://www.wikidata.org/entity/Q28194288: Missing property: http://www.wikidata.org/prop/direct/P19 Missing property: http://www.wikidata.org/prop/direct/P19 Missing property: http://www.wikidata.org/prop/direct/P19 Missing property: http://www.wikidata.org/prop/direct/P19 Missing property: http://www.wikidata.org/prop/direct/P19 OR validating http://www.wikidata.org/entity/Q28194288: Missing property: http://www.wikidata.org/prop/direct/P19 Missing property: http://www.wikidata.org/prop/direct/P19 Missing property: http://www.wikidata.org/prop/direct/P19 Missing property: http://www.wikidata.org/prop/direct/P19 Missing property: http://www.wikidata.org/prop/direct/P19 Missing property: http://www.wikidata.org/prop/direct/P19
- It takes over 10s for some of the more difficult items. This isn't scalable.
WikiShape
[edit]http://wikishape.weso.es/ by @Jelabra: runs validations in parallel so even though the hard items (eg 1,2,4 the count is zero-based) are spinning 10 min already, I can inspect the easier items.
- I think the hard items have relatives, so they cause recursive validation (see next section) and I'm doubtful their validation will ever finish.
- Parallel threads reuse validations of subsidiary entries, which is great: after 100, it added 27 "country", "language" and "human", and each is checked only once.
- The error messages are quite hard to grok, see below for 6 wd:Q3650675. It'd take me probably 20 min to understand what's wrong.
Error: None of the candidates matched. Attempt: Attempt: node: wd:Q3650675, shape: <internal://base/human> Bag: C0,C1?,C2,C3+,C4*,C5*,C6*,C7*,C8*,C9*,C10*,C11*,C12*,C13*,C14*,C15*,C16*,C17+,C18* Candidate lines: CandidateLine: ((<http://www.wikidata.org/prop/direct/P31>,<http://www.wikidata.org/entity/Q5>),C0) ((<http://www.wikidata.org/prop/direct/P21>,<http://www.wikidata.org/entity/Q6581097>),C1) ((<http://www.wikidata.org/prop/direct/P569>,"1827-01-01T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>),C3) ((<http://www.wikidata.org/prop/direct/P735>,<http://www.wikidata.org/entity/Q15501913>),C4) ((<http://www.wikidata.org/prop/direct/P106>,<http://www.wikidata.org/entity/Q1028181>),C6) ((<http://www.wikidata.org/prop/direct/P27>,<http://www.wikidata.org/entity/Q219>),C7) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@de),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Никола Образописов"@bg),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikola Obrazopisov"@sq),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@nl),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@es),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikola Obrazopisov"@en),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikola Obrazopisov"@ga),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@fr),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@pt),C17) CandidateLine: ((<http://www.wikidata.org/prop/direct/P21>,<http://www.wikidata.org/entity/Q6581097>),C1) ((<http://www.wikidata.org/prop/direct/P569>,"1827-01-01T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>),C3) ((<http://www.wikidata.org/prop/direct/P735>,<http://www.wikidata.org/entity/Q15501913>),C4) ((<http://www.wikidata.org/prop/direct/P106>,<http://www.wikidata.org/entity/Q1028181>),C6) ((<http://www.wikidata.org/prop/direct/P27>,<http://www.wikidata.org/entity/Q219>),C7) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@de),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Никола Образописов"@bg),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikola Obrazopisov"@sq),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@nl),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@es),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikola Obrazopisov"@en),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikola Obrazopisov"@ga),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@fr),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Nikolai Obrazopisov"@pt),C17) ((<http://www.wikidata.org/prop/direct/P31>,<http://www.wikidata.org/entity/Q5>),C18)
Or look at 10 wd:Q3804651:
Error: None of the candidates matched. Attempt: Attempt: node: wd:Q3804651, shape: <internal://base/human> Bag: C0,C1?,C2,C3+,C4*,C5*,C6*,C7*,C8*,C9*,C10*,C11*,C12*,C13*,C14*,C15*,C16*,C17+,C18* Candidate lines: CandidateLine: ((<http://www.wikidata.org/prop/direct/P31>,<http://www.wikidata.org/entity/Q5>),C0) ((<http://www.wikidata.org/prop/direct/P21>,<http://www.wikidata.org/entity/Q6581097>),C1) ((<http://www.wikidata.org/prop/direct/P569>,"1864-05-18T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>),C3) ((<http://www.wikidata.org/prop/direct/P735>,<http://www.wikidata.org/entity/Q21104340>),C4) ((<http://www.wikidata.org/prop/direct/P106>,<http://www.wikidata.org/entity/Q1028181>),C6) ((<http://www.wikidata.org/prop/direct/P27>,<http://www.wikidata.org/entity/Q219>),C7) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@ga),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@en),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@ast),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@nl),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@de),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@it),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@sq),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@fr),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@es),C17) CandidateLine: ((<http://www.wikidata.org/prop/direct/P21>,<http://www.wikidata.org/entity/Q6581097>),C1) ((<http://www.wikidata.org/prop/direct/P569>,"1864-05-18T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>),C3) ((<http://www.wikidata.org/prop/direct/P735>,<http://www.wikidata.org/entity/Q21104340>),C4) ((<http://www.wikidata.org/prop/direct/P106>,<http://www.wikidata.org/entity/Q1028181>),C6) ((<http://www.wikidata.org/prop/direct/P27>,<http://www.wikidata.org/entity/Q219>),C7) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@ga),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@en),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@ast),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@nl),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@de),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@it),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@sq),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@fr),C17) ((<http://www.w3.org/2000/01/rdf-schema#label>,"Ivan Angelov"@es),C17) ((<http://www.wikidata.org/prop/direct/P31>,<http://www.wikidata.org/entity/Q5>),C18)
Recursive Shapes
[edit]E10 includes recursive shape refs.
<human> EXTRA wdt:P31 { wdt:P22 @<human> *; # father wdt:P25 @<human> *; # mother wdt:P3373 @<human> *; # sibling wdt:P26 @<human> *; # husband/wife wdt:P40 @<human> *; # children wdt:P1083 @<human> *; # relatives
But some British politician ancestries have been tracked back to Adam (through some uncertain/fictional rulers). So if you follow all these links recursively, back and forth, you may pick up a majority of Humans on WD (5-6M). So following such recursion faithfully is suicide, and shex.js does seem to recurse faithfully:
wd:Q2989196@!START validating http://www.wikidata.org/entity/Q2989196 as //www.wikidata.org/wiki/Special:EntitySchemaText/human: validating http://www.wikidata.org/entity/Q3657670: validating http://www.wikidata.org/entity/Q2989196: validating http://www.wikidata.org/entity/Q4162892: validating http://www.wikidata.org/entity/Q35228: Missing property: http://www.wikidata.org/prop/direct/P31 OR validating http://www.wikidata.org/entity/Q4162892: validating http://www.wikidata.org/entity/Q35228: Missing property: http://www.wikidata.org/prop/direct/P3
What we need instead is something like:
<human> EXTRA wdt:P31 { wdt:P31 [wd:Q5]; wdt:P22 @<mini_human> *; # father wdt:P25 @<mini_human> *; # mother wdt:P3373 @<mini_human> *; # sibling wdt:P26 @<mini_human> *; # husband/wife wdt:P40 @<mini_human> *; # children wdt:P1083 @<mini_human> *; # relatives ... } <mini_human> EXTRA wdt:P31 { wdt:P31 [wd:Q5]; }
So it's really easy for a schema writer to shoot himself in the foot.
Discussion
[edit]I've thought a lot about shape validation performance and scalability, and I think that fetching entities ad nauseum (esp. through numerous SPARQL queries) can never scale. What we need is for SHEX engines to strictly enforce limits on what's checked about referenced WD entities: basically we need an "existence check" but not full recursive checking.
@EricP, Jelabra: what do you think? --Vladimir Alexiev (talk) 09:41, 9 January 2020 (UTC)
- On pyshex: this one works. You have given incorrect input in the
sparql=
parameter. —MisterSynergy (talk) 11:27, 9 January 2020 (UTC)
- On Blaze: @Vladimir Alexiev: we've raised [an issue](https://phabricator.wikimedia.org/T243595) to move validation to a Blaze instance so we're not spending 99% of our time waiting for SPARQL scheduling. – The preceding unsigned comment was added by EricP (talk • contribs) at 07:50, January 24, 2020 (UTC).
shex-simple
[edit]For the tool at
Sample link:
- https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/doc/shex-simple.html?data=Endpoint:%20https://query.wikidata.org/sparql&hideData&manifest=[]&textMapIsSparqlQuery&schemaURL=%2F%2Fwww.wikidata.org%2Fwiki%2FSpecial%3AEntitySchemaText%2FE10
is there a way to link the sparql query in the url? (To avoid having to paste it into the query field). --- Jura 01:11, 10 February 2020 (UTC)
- I think a URL parameter "shape-map" does the work here: https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/doc/shex-simple.html?data=Endpoint:%20https://query.wikidata.org/sparql&hideData&manifest=[]&textMapIsSparqlQuery&schemaURL=%2F%2Fwww.wikidata.org%2Fwiki%2FSpecial%3AEntitySchemaText%2FE10&shape-map=SELECT%20?item%20WHERE%20{%20?item%20wdt:P31%20wd:Q5%20}%20LIMIT%205 —MisterSynergy (talk) 07:11, 10 February 2020 (UTC)
- Thanks @MisterSynergy:! Is there a way to autorun it? --- Jura 08:17, 10 February 2020 (UTC)
- No idea. I meanwhile do not use this shex-simple tool any longer, as it seems to be very basic in functionality and there are other ones available. I like the pyshexy API that is linked about two sections above this one most at this time. --MisterSynergy (talk) 08:39, 10 February 2020 (UTC)
- Thanks @MisterSynergy:! Is there a way to autorun it? --- Jura 08:17, 10 February 2020 (UTC)
- Interesting. I will try to add both to Talk:Q4925477. --- Jura 08:58, 10 February 2020 (UTC)
- No idea what your exact plan is, but for your convenience, User:Teester/CheckShex.js might also be worth a try... It adds an input field on item pages where you just need to provide an E-number for an evaluation of that item. --MisterSynergy (talk) 09:08, 10 February 2020 (UTC)
- The idea is to provide a link to the check the item against a schema. There are a few other approaches at "This item:" in the list. --- Jura 09:27, 10 February 2020 (UTC)
- No idea what your exact plan is, but for your convenience, User:Teester/CheckShex.js might also be worth a try... It adds an input field on item pages where you just need to provide an E-number for an evaluation of that item. --MisterSynergy (talk) 09:08, 10 February 2020 (UTC)
- Interesting. I will try to add both to Talk:Q4925477. --- Jura 08:58, 10 February 2020 (UTC)
ShExStatements
[edit]During Wiki Techstorm 2019 [1], we started exploring simplification for creating shape expressions. One possibility that was explored was to make something like QuickStatements that will take CSV/tabular format as input to generate shape expressions.
ShExStatements is now released: https://github.com/johnsamuelwrites/ShExStatements
The main goal is to help newcomers write shape expressions. The users write a CSV file and ShExStatements will translate it to a shex file.
Take for example, a CSV file concerning a language (with prefixes): https://github.com/johnsamuelwrites/ShExStatements/blob/master/examples/language.csv is translated to a shape expression [2].
There are five columns. Column 1 is used for specifying the node name, 2 for specifying the property value, 3 for one or possible values, 4 is for cardinality (+,*) and column 5 for comments.
Columns 3,4 and 5 are empty for prefixes. Columns 1, 2, 3 are mandatory. Column 3 can be . (to say any value).
Examples related to Wikidata that were used to create some entity Schemas E177, E178, E179 can be found here [3], with some additional examples in [4].
For a detailed documentation, please check [5].
Please let me know if you have any questions/remarks.
- https://medium.com/@jsamwrites/wiki-techstorm-2019-a996d69c60a5
- https://github.com/johnsamuelwrites/ShExStatements#quick-start
- https://github.com/johnsamuelwrites/ShExStatements/tree/master/examples/wikidata
- https://github.com/johnsamuelwrites/ShExStatements/tree/master/examples
- https://github.com/johnsamuelwrites/ShExStatements/blob/master/docs.md
Experimenting with Bioschemas at Scholia
[edit]For Scholia, we have begun to explore how to annotate entities using Bioschemas (Q93995803). You can see this in action at taxon profiles like toolforge:scholia/taxon/Q12024, whose HTML now includes the following:
/* BioSchemas annotation */ if (item.claims.P225) { try { /* Taxon */ var taxonName = item.claims.P225[0].mainsnak.datavalue.value; bioschemasAnnotation = { "@context" : "https://schema.org", "@type" : "Taxon" , "name" : taxonName , "url" : "http://www.wikidata.org/entity/Q12024" } if (item.claims.P105) { var taxonRank = item.claims.P105[0].mainsnak.datavalue.value.id; bioschemasAnnotation.taxonRank = "http://www.wikidata.org/entity/" + taxonRank ; } if (item.claims.P171) { var parent = item.claims.P171[0].mainsnak.datavalue.value.id; bioschemasAnnotation.parentTaxon = "http://www.wikidata.org/entity/" + parent ; } $( '#bioschemas' ).append( JSON.stringify(bioschemasAnnotation) ); // console.log(JSON.stringify(bioschemasAnnotation, "", 2)) } catch(e) {} }
In the process, we were wondering to what extent such Wikidata-generic annotations could be represented on Wikidata rather than hardcoded on the Scholia end, and are inviting your comments, here or via a currently open pull request for similar annotation of molecular entities. --Daniel Mietchen (talk) 20:24, 11 May 2020 (UTC)
New subpage to document and explore subsets of Wikidata
[edit]as per Wikidata:WikiProject Schemas/Subsetting. --Daniel Mietchen (talk) 09:51, 4 June 2020 (UTC)
Date-conditional checks
[edit]WikiProject ShEx has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
Is it possible to have a check for election items that says something like "There should be a successful candidate (P991) unless there's a point in time (P585) that's in the future"? Or would we need to have two separate schemas for "future elections" and "past elections"? --Oravrattas (talk) 13:23, 8 July 2020 (UTC)
- @Oravrattas: So far as I understand, the checking process will always report failure for missing fields. I think the solution would be to only have one schema, no checks for special conditions, but there could be a second process which runs either before or after the schema check to exclude items which have certain characteristics. Blue Rasberry (talk) 22:29, 1 August 2020 (UTC)
- @Bluerasberry: I might be missing something about your suggestion, but I'm not sure how that would help. I don't want to skip all checks on future elections: I still want to ensure that those have a date, a jurisdiction, an office contested, candidates, the previous election, etc, but there are properties that will only have values once the election has passed, such as the number of votes cast, and the winning candidate. How can I validate that those do not have any value yet if the date of the election is in the future, but should if it's in the past? --Oravrattas (talk) 08:08, 2 August 2020 (UTC)
- @Oravrattas: As far as I know, there is no easy option to compare dates. You could maybe use regular expressions for it. If you do so, you do not need multiple shape expressions, however you will need multiple shapes. You can combine shapes via
or
andand
, so<#election> { ... } and (@<#election-past> or @<#election-future>)
with the one for future election with a regular expression matching future dates and the past election having successful candidate (P991) and point in time (P585) and that might also have a an expression for past dates so that no future election can have an outcome already. --CamelCaseNick (talk) 20:45, 2 August 2020 (UTC)
EntitySchema labels
[edit]EntitySchema labels don't seem to be working for me - do they need to be updated to a new format? Sj (talk) 18:56, 30 July 2020 (UTC)
Early test schemas - still in use? What happens when the incrementer gets to 733?
[edit]There are a few schemas with odd numbers: E734 (E734) (family name), E735 (E735) (given name), E999 (E999) (Borked), E11424 (E11424) (film). Are these actually in use? What was the mechanism of generating those EIDs, do we want to keep it, will the ID incrementing pass over them smoothly? Sj (talk) 18:59, 30 July 2020 (UTC)
- @Sj: I see - it looks like at some point, there were 2+ processes for assigning item numbers, and now hopefully there should only be the automated one. You are asking what happens when the canonical current system counts to the item numbers which previous earlier systems assigned.
- So far as I know, the current practice is keeping all schemas, even test schemas, regardless of whether anyone uses them. I expect that the current desired practice is for the number assigning process to skip any existing numbers, not write over them. Eventually I suppose we should have inclusion or notability criteria for schemas, or otherwise, anyone could automatically generate countless schemas never to be used. Blue Rasberry (talk) 22:26, 1 August 2020 (UTC)
- Thank you kindly. Sj (talk) 15:24, 17 August 2020 (UTC)
Creating schemas for basic concepts
[edit](Reposting here:)
The entity schemas E3 (E3) ((Wikidata Item), E5 (E5) (Statement), E6 (E6) (Language mappings), E7 (E7) (Citation), E8 (E8) (External RDF), and E9 (E9) (Wikidata-Wikibase) are all blank.
- There should be something there, even if it is all comments and optional elements.
- Is there a quick way to find other blank entity schemas?
- Was there discussion about this when schemas were being created for the first time?
Sj (talk) 15:28, 17 August 2020 (UTC)
Help in creating Schema for Pokémon species
[edit]Hi, I don't know how to create a correct Schema for a Pokémon species. Could someone help me? Item QYYY which defines a Pokémon species must have:
- Pokémon type: instance of (P31) should have one or two Pokémon type(s) (a Pokémon type is an item with subclass of (P279) as Pokémon species (Q3966183) and subclass of (P279) as an item which has subclass of (P279) as Pokémon type (Q1266830)). Both must have applies to part, aspect, or form (P518) + first type (Q25931659) or second type (Q25931668) as qualifier of the type;
- part of (P361): a Pokémon species is at least part of:
- a generation (an item which has part of (P361) + an item with instance of (P31) + Pokémon generation (Q3759600))
- an evolutionary line or no evolutionary line:
- if the Pokèmon species doesn't evolve, it must have Pokémon without evolution (Q25707067) with follows (P155) which is either novalue or Pokémon egg (Q18129517) and followed by (P156) with novalue
- if the Pokémon species evolves, it must have an item which defines its evolutionary line (item with instance of (P31) + Pokémon evolutionary line (Q15795637) and has part(s) (P527) with QYYY and series ordinal (P1545) with a qualifier). Moreover, follows (P155), followed by (P156) and series ordinal (P1545) must be correctly mapped
- a game (an item which has instance of (P31) + list of Pokémon in a game (Q99973598))
- a egg group (an item which has subclass of (P279) + Egg group (Q26037540))
- mass (P2067) in kg and lb must be defined
- color (P462) must be defined
- height (P2048) must be defined in m/cm
- Pokémon index (P1685): It must have a number, P155, P156 and catalog (P972) + an item with subclass of (P279) + Pokédex (Q1250520) (at least catalog (P972) must be defined with National Pokédex (Q20005020))
- media franchise (P8345) + Pokémon (Q864)
- Bulbapedia article ID (P4845) must be defined and usually it is a string formed by the English label with spaces replaced with underscore and "_(Pokémon)"
Thank you very much for the ones who will help me! --★ → Airon 90 13:01, 5 October 2020 (UTC)
- @Airon90: I see that you are active at Wikidata:WikiProject Pokémon, and presumably if you learned this then you would bring the practice back to that WikiProject. I am still learning this myself and I do not know how to help, but I wanted to thank you both for asking the question and doing documentation at that other WikiProject. Pokemon are very important for the history of Wikipedia as the origin of English Wikipedia's notability policy. There is a lot of interest in good Pokemon content everywhere, so I think we should get this right. Blue Rasberry (talk) 14:28, 18 November 2020 (UTC)
- So has anyone done this, because it's been like 3 years and I think I can just go ahead and do it if nobody else is. OmegaFallon (talk) 15:54, 10 March 2023 (UTC)
- Schema is located at E394, any additions or tweaks are appreciated. I also recommend checking out the constraints on Pokémon index (P1685) as a guideline for the schema. @Airon90 @Bluerasberry OmegaFallon (talk) 16:39, 10 March 2023 (UTC)
- So has anyone done this, because it's been like 3 years and I think I can just go ahead and do it if nobody else is. OmegaFallon (talk) 15:54, 10 March 2023 (UTC)
Best way to browse schemas?
[edit]Excuse me if I am missing this. How can I browse schemas?
I want to see schemas for instance of (P31) = human (Q5). Among other things, I am hoping to identify the most common properties among schemas, but I also would like to be able to browse individual schemas. I do not see how to search for schemas around a given theme. Thanks. Blue Rasberry (talk) 14:25, 18 November 2020 (UTC)
- The answer is User:HakanIST/EntitySchemaList
- Right now there are only 264 schemas. The reason I could not find many is because hardly any exist. All of this is still new. Blue Rasberry (talk) 21:14, 18 November 2020 (UTC)
- @Bluerasberry:
- Yes, it still quite new and a bit too technical to be widely adopted yet.
- That said, you can just use Special:Search and ask for results in the EntitySchema namespace only.
- Cdlt, VIGNERON (talk) 14:33, 22 November 2020 (UTC)
lightweight Shapes
[edit]I think there is need for a lightweight approach. In other words the full Shex specification may be too powerful to be implemented for e.g. a daily check of millions of entities. However, a lightweight approach would reduce the worth of the shape set we already have, unless there is an automatic conversion to the lightweight form. Practically another standard is needed. Comments? --SCIdude (talk) 09:55, 19 December 2020 (UTC)
- Seems to require a little more concrete datas than « may be too powerful ». Which concrete shape is a problem ? There is already implementations in the wild, is it really needed to reimplement everything ?
- My guess is that complex shape construction are already a bit too hard to write for most people. author TomT0m / talk page 11:13, 19 December 2020 (UTC)
- You mean there are implementations that can handle this? Let's talk a real example: I have a list of 1.6M entities and a list of shapes; the task is to provide, for each shape, a list of items out of the 1.6M that are valid for each shape, and the result should not be older than 24 hours since the full WD dump download. Bonus if it can be done on a quad-core desktop with 32G of RAM. --SCIdude (talk) 16:32, 19 December 2020 (UTC)
Complex constraints
[edit]Are constraint such as the constraint complex written on neutron number (P1148) intended to be used on isotope items, that checks that the number used as values are the sum of two property values expressable in ShEx ? I see this is in principle possible with Shacl.
So now we have schemas, are these {{Complex constraint}}
s better written as schemas ? Is there reports in the same fashion that could report errors ? author TomT0m / talk page 14:42, 25 November 2021 (UTC)
Redirect project schema request page
[edit]I think we should redirect Wikidata:WikiProject_Schemas/Request an EntitySchema to Wikidata:Schema proposals to be the future canonical place for requests. If there are no objections in the next few days I'll go ahead and do it (supporting votes also welcome). --SilentSpike (talk) 22:07, 30 December 2021 (UTC)
- I think we should not do this because the Request a Schema page was specifically set up for people to ask for help in creating a schema if they are not proficient in ShEx. This is similar to the Request a Query page we have and has been requested in several discussions about how schemas are too hard for people. LydiaPintscher (talk) 11:58, 31 December 2021 (UTC)
- That being said the page for sure could use some love and attention. LydiaPintscher (talk) 11:59, 31 December 2021 (UTC)
- Ah I see now the distinction between proposal and request. In that case I retract my suggestion above. SilentSpike (talk) 14:01, 31 December 2021 (UTC)
Thoughts about ShEx integration in Wikidata and OpenRefine
[edit]Not being aware of this WikiProject I posted some thoughts and questions over at Wikidata talk:Schemas, maybe it is of interest to people here. − Pintoch (talk) 15:15, 23 July 2022 (UTC)
Items for EntitySchemas?
[edit]hello!
I am missing some way of SPARQL query entity schemas (e.g. search all schemas that use a property etc)
Should we have Wikidata items for each schema?
That way we can also track schemas on focus lists of Wikiprojects, and better categorize them, alongside all the other cool linked data stuff we love. TiagoLubiana (talk) 14:43, 19 October 2022 (UTC)
- I also am trying to put together a schema that will check a collection of items for validity. None of the examples have a query that produces the target set of items. Are there schemas that have a good target set and how does one set up a schema with a query to get a set of violations of the schema in a way that can be used to help editors that do not have knowledge of the schema language?
- My goal is to have a schema that reports violations of the intended meaning of is metaclass for (P8225). Peter F. Patel-Schneider (talk) 07:52, 15 August 2023 (UTC)
Data Modelling Days, online gathering, November 30 - December 2, 2023
[edit]Hello all,
Following the past events dedicated to data quality and data reuse, the Wikidata team wanted to host a new gathering dedicated to data modelling.
The Data Modelling Days will take place online over three days and will host a variety of discussions, workshops and practical sessions on the topics of Wikidata ontologies, EntitySchemas, modelling issues and various other challenges.
The event is open to everyone, regardless of your experience with modelling data on Wikidata. We particularly encourage people who are working on specific topics to join the event and present their modelling challenges.
If you know people or groups who are already discussing modelling issues on Wikidata, or would have something interesting to contribute, please share this message with them!
You can find more information on the dedicated page, sign up and let us know what you are interested in, you can already propose discussions and workshops on the talk page until November 19th.
If you cannot attend, don’t worry, most sessions will be recorded, notes will be taken and slides will be shared.
We are looking forward to seeing you and learning more about your modelling challenges during the Data Modelling Days! If you have any questions, feel free to reach out to me. Best, Lea Lacroix (WMDE) (talk) 14:25, 9 October 2023 (UTC)
model schema
[edit]Is there a model schema that can be used to determine how schemas are supposed to work? A model schema has to be non-trivial, match what is currently in Wikidata, have some utility, and have a good explanation of this utility. Peter F. Patel-Schneider (talk) 21:29, 31 October 2023 (UTC)
Coming up soon: Wikidata Data Modelling Days, online, November 30-December 2
[edit]Hello all,
If you are regularly involved in adding, organizing or reusing data from Wikidata, you certainly encountered some questions or issues related to data modelling: how to describe and structure information in a consistent way on Wikidata. This is a big topic for the community at large, and that's why we will address it together during a 3-days online event, the Data Modelling Days, that will take place next week, on November 30th, December 1st and 2nd.
During this online gathering, we will have lots of discussions on various topics that you can discover in the program: we will talk about Entity Schemas and how they can be useful to improve data quality and consistency on Wikidata, how to model heritage, gender, references or web fiction, the challenges encountered by people reusing Wikidata's data inside and outside the Wikimedia projects, how to model data on a fresh new Wikibase instance, and many other exciting topics.
Aside from attending sessions and joining the discussions, you can also join our Data Modelling Clinic sessions, where you can bring any topic you are working on, ask questions or ask the community for feedback or help. You will find these sessions on each day in the program.
The event is taking place online on the video conference platform Jitsi, it is free, no registration needed (although you are invited to add your name to the participants list). Most sessions will be recorded in video and have collaborative notes, and we will publish a list of outcomes and next steps for each session.
We are hoping to see a lot of you at the event!
If you have any questions, feel free to ask on the talk page or directly by writing to me. Best, Lea Lacroix (WMDE) (talk) 16:02, 24 November 2023 (UTC)
New Schema Validation Tool UI
[edit]Hello everyone, As part of my thesis project I've made a new UI mode for the Schema Validator used by Wikidata. I discussed this already during the Data Modelling Days last year, but now the code is ready to use. The new UI represents validation reports as a table rather than a very long string, and replaces most links with hyperlinks with some of the text behind them. I just started hosting it yesterday on https://shex-validator.toolforge.org/packages/shex-webapp/doc/shex-simple-improved.html. I'm also looking for people willing to participate in evaluating the user experience and the ease of use of this new UI in a roughly 1 hour interview sometime in may. For more information, check out my user page. If you want to registed for the evaluation interviews, you can do so at https://datumprikker.nl/event/index/fuwv62b5tatqq4vr.
I've taken the liberty of adding my new tool to the tool list on the project page.M.alten.tue (talk) 11:08, 30 April 2024 (UTC)
- @M.alten.tue: I signed up! I have used schema validation processes but I have not seen your tool. Should I use the tool and be familiar with it before meeting you? Thanks for developing this and advancing the conversation. Bluerasberry (talk) 13:54, 30 April 2024 (UTC)
- @Bluerasberry: You don't need to get familiar with it ahead of time, but you can if you want to. Can you confirm to me what time it says for you on your appointment? The website I used is Dutch so it may be using Amsterdam time also for international people. The time you entered is 11:30 my time which should be very early in New York time (I think around 5 or 6 in the morning). I will add a note to the datumprikker for people to check this M.alten.tue (talk) 14:18, 30 April 2024 (UTC)
New Property: P12861 (EntitySchema for class)
[edit]A new property EntitySchema for this class (P12861) (currently called as EntitySchema for class) is now available (finally after a long time). For example, some classes have now been linked to the already existing entity schemas, such as human (Q5), natural number (Q21199), film festival (Q220505). A directory of existing entity schemas that can be potentially linked to the different classes can be found here. John Samuel (talk) 13:23, 3 July 2024 (UTC)
Entityshape userscript updates
[edit]With the availability of EntitySchema for this class (P12861), I've added some enhancements to the Entityshape userscript (User:Teester/EntityShape.js)
Multiple Schemas
[edit]- You can now check multiple schemas simultaneously by entering the schema identifiers separated by commas (e.g. E10, E236, E257).
- The schema results will be combined in the summary view, giving precedence if the property fails. So, if you check against 2 schemas, and the property passes one, but fails the other, then it will display as failed.
- Results are individually broken down in properties and statements so you can tell which schema a property might have failed against.
Automatic Checking
[edit]- The userscript will now try and auto-determine what schemas to check against if you press "check" without anything in the search field.
- It does this by checking for the presence of EntitySchema for this class (P12861) on any of the properties or items in statements of the entity being checked and checking against them all.
- For example, at time of writing, Simon Harris (Q7518922) contains instance of (P31) of human (Q5), which has EntitySchema for this class (P12861) of E10, occupation (P106) of politician (Q82955), which has EntitySchema for this class (P12861) of E257 and Oireachtas member ID (P4690) which has EntitySchema for this class (P12861) of E236, so the userscript will automatically check against E10, E236 and E257 and combine the results.
- You can override this by entering a schema (or schemas) in the search box before pressing "check".
Results should hopefully improve as EntitySchema for this class (P12861) gets more widely used. Feedback is welcome. Teester (talk) 15:21, 5 July 2024 (UTC)
- @Teester Thank you, this is great! A few suggestions and questions:
- As a feature request, for mandatory or optional statements missing from an item, would it be possible to pre-prepare blank statement fields for a user to then edit a value/qualifiers/references for and submit? This would allow, for example, someone to create a new item which is only an instance of a class that has EntitySchema for this class (P12861), then they can use your EntityShape script to dramatically decrease the time required to add all expected or suggested statements.
- I get "E30: too many statements" as an error when a schema expects a single statement for a property, but there are two, with one of those two having a preferred rank. I may be mistaken, but I thought with a ShEx statement "wdt:Pxx [wd:Qxx];" for expecting a single item, it would respect statement ranks and in this example, validation would be successful? And if someone did want to get into the details of multiple statements of different ranks, their ShEx statements would need to use "p:"/"ps:"/etc prefixes.
- For "E30: Not in schema", perhaps it is necessary to maintain a list of common properties which shouldn't trigger this warning. Examples of such common properties are different from (P1889), hashtag (P2572) and described by source (P1343). I again could be mistaken, but I don't think it's possible to maintain such a list as an EntitySchema item that other EntitySchema items can import.
- For a large item such as Donald Trump (Q22686), the automatic schema detection validates against 12 EntitySchemas and this takes some time to complete, and there is no indication to the user that work is being done behind the scenes. Could a visual indication be provided to users to let them know the validation is in progress?
- For a large item such as Donald Trump (Q22686), the automatic schema detection validates against 12 EntitySchemas and property boxes often don't have enough height to list all validation results, so many get truncated. Could the CSS for property boxes be modified to allow the property box to expand to list all 12 EntitySchema validation results without truncation occurring?
- Dhx1 (talk) 16:07, 5 July 2024 (UTC)
- > For "E30: Not in schema", perhaps it is necessary to maintain a list of common properties which shouldn't trigger this warning. Examples of such common properties are different from (P1889), hashtag (P2572) and described by source (P1343). I again could be mistaken, but I don't think it's possible to maintain such a list as an EntitySchema item that other EntitySchema items can import.
- This could be done, language wise. I don't know if it is ideal organization wise, I don't have enough experience to judge that.
- You could have a schema containing one shape, lets say
<@exceptionalProperties>
, which for all properties you want to ignore has a line that goeswdt:P## . *;
, meaning each line indicates "0 or more connections to this property with any type of node". Then other schemas can import this schema, and use the shape as per https://shex.io/shex-primer/index.html#labeled-constraints M.alten.tue (talk) 09:04, 6 July 2024 (UTC) - @Dhx1 Thanks for your feedback.
- Being able to add statements based on the entityschema sounds like an interesting idea, but would need a bit of thinking to implement in a way that was not confusing or that wouldn't cause problems.
- The response is actually correct here. E30 is looking for a single statement in the property and doesn't care about statement ranks unless the schema were to encode it. That being said, entityshape doesn't support statement ranks yet anyway.
- I've been looking at other ways to display the not in schema warnings. My initial thought is to not display the warning on properties in the same way as they are not displayed on statements, and change the not in schema results in the summary to some sort of expandable list with a header of something like "xx properties not in schema".
- I've added a spinner which shows up beside the when the api is being queried.
- Property box height is a legitimate problem that I need to look at.
- Teester (talk) 15:41, 8 July 2024 (UTC)
automatic notification when an item that is an instance of a class with a schema is edited
[edit]Are there any plans to make notification of schema violations automatic? That is, whenever anyone makes an edit to an item that is an instance of a class with a schema they are immediately told about any violations that their edit caused? Peter F. Patel-Schneider (talk) 14:50, 12 August 2024 (UTC)