Wikidata talk:WikiProject Schemas

From Wikidata
Jump to navigation Jump to search

Meeting Notes[edit]

14 Feb 2018[edit]

ericP: what is our mission Andra: to raise awareness and grow a community ... we can grow the WikiProject page over time ... use cases for ShEx ... namespace for ShEx- Lucas' idea from WikidataCon, should host shapes on their own URI, what would that URI be?

ericP: conflicting intrests- give ownership to WD community, conflicting one is to have them visible so people can copy and steal shapes even if they are outside wd community ... maybe move shemas over to WD and then mirror to shex schemas space when you want

Lucas: To create a new namespace- prob not trivial. Even if no one argues, the technical side might be complicated.

Andra: maybe postpone this until we have more use cases. ... Kat's http://wikidp.org/ demo- can we use this for additonal domains by driving it with shex- property checklist driven by shex ... We could create a generic version of the portal, containerize it and then poeple could slot in their own shape expressions to create their own property checklists ... need shapes avail through URL to reuse Harold Solbrig's pyshex, so that is why i need a namespace for shape URIs

ericP: demo manifests to run in Eric's or Jose's implementation- like the primer try it links- 1. create manifests so ... good queries and validation tests either that are picked up remotely, or static data, create the schemas that will be shared, demo data, and manifests in a picklist ... demos show why validation is useful, hints on how is used in different domains, give people ideas ... wiki page with try it links, if we have a data structure, we can express it like this, that catches errors like this, help people

Andra: create a page similar to the example queries


TODOs for next meeting:

   Lucas- ask around WMDE about how to request a new namespace
   Kat- create an example on the WikiProject page
   Andra- create an example on the WikiProject page
   Kat- paste notes in the talk page of the WikiProject
  ? Create phabricator ticket for a new namespace?

Examples and tools[edit]

Andra Waagmeester Andrawaag (talk) 19:33, 30 January 2018 (UTC) YULdigitalpreservation (talk) 13:32, 6 February 2018 (UTC) Daniel Mietchen (talk) 01:52, 7 February 2018 (UTC) Finn Årup Nielsen (fnielsen) (talk) 13:55, 13 February 2018 (UTC) Lucas Werkmeister (talk) 12:34, 14 February 2018 (UTC) John Samuel 20:31, 26 February 2018 (UTC) Dhx1 (talk) 02:39, 8 March 2018 (UTC) Jneubert (talk) 13:35, 19 June 2018 (UTC) User:Malore Malore (talk) 15:59, 24 August 2018 (UTC) Vladimir Alexiev (talk) 06:33, 10 September 2018 (UTC) Jose Emilio Labra Gayo (talk) 19:34, 21 November 2018 (UTC) Spinster 💬 08:45, 18 December 2018 (UTC) Egon Willighagen (talk) 07:43, 5 March 2019 (UTC) EricP (talk) 10:44, 14 March 2019 (UTC) Tombakerii (talk) 15:03, 17 May 2019 (UTC) Maxlath (talk) 13:26, 19 May 2019 (UTC) Jumtist (talk) 13:29, 19 May 2019 (UTC) SilentSpike (talk) 13:48, 19 May 2019 (UTC) MisterSynergy (talk) 19:17, 19 May 2019 (UTC) --Harmonia Amanda (talk) 06:32, 20 May 2019 (UTC) Salgo60 (talk) 09:07, 20 May 2019 (UTC) Ivanhercaz (Talk) 15:38, 20 May 2019 (UTC) Andrew Su (talk) 15:50, 20 May 2019 (UTC) Mlemusrojas (talk) 16:50, 21 May 2019 (UTC) Dani Fernandez 14:11, 23 May 2019 (UTC) PKM (talk) 02:43, 29 May 2019 (UTC) Sannita - not just another it.wiki sysop 09:47, 2 June 2019 (UTC) Infomuse (talk) 22:37, 3 June 2019 (UTC) Buccalon (talk) 17:42, 18 June 2019 (UTC) author  TomT0m / talk page 11:52, 30 June 2019 (UTC) Ecritures (talk) 20:08, 15 July 2019 (UTC) Fuzheado (talk) 17:03, 10 July 2019 (UTC) Iovka Boneva (Iovka) Csisc (talk) 20:43, 24 August 2019 (UTC) Fuzheado (talk) 18:01, 23 October 2019 (UTC) Ash Crow (talk) Pdehaye (talk) 22:13, 27 October 2019 (UTC) Tinker Bell 20:18, 1 November 2019 (UTC) So9q (talk) 06:26, 13 November 2019 (UTC) ElanHR (talk) 21:29, 14 November 2019 (UTC) Arybolab (talk) Blue Rasberry (talk) 14:21, 24 November 2019 (UTC) Susanna Ånäs (Susannaanas) (talk)


Pictogram voting comment.svg Notified participants of WikiProject ShEx Could you please first provide examples of ShEx shapes that check particular data models in Wikidata and guidelines how to check Wikidata against this shapes? I'd prefer

  • a web form tailored to Wikidata to edit and check shape expressions with syntax highlighting and typeahead, such as Wikidata query service
  • a bot that regularly runs ShEx given at Wiki pages and posts the results, such as User:ListeriaBot

-- JakobVoss (talk) 07:15, 22 February 2018 (UTC)

I've just added a "Tutorials and examples" section on the project homepage, with a very basic example on how to get started with ShEx2. Please help improving! (Thanks to Eric for fixing two minor issues in ShEx2!) Jneubert (talk) 12:06, 25 June 2018 (UTC)
Updated version of How to get started with ShEx on Wikidata? - please help improving. --Jneubert (talk) 14:35, 25 July 2019 (UTC)

Wikidata ShEx Inference tool[edit]

Andra Waagmeester Andrawaag (talk) 19:33, 30 January 2018 (UTC) YULdigitalpreservation (talk) 13:32, 6 February 2018 (UTC) Daniel Mietchen (talk) 01:52, 7 February 2018 (UTC) Finn Årup Nielsen (fnielsen) (talk) 13:55, 13 February 2018 (UTC) Lucas Werkmeister (talk) 12:34, 14 February 2018 (UTC) John Samuel 20:31, 26 February 2018 (UTC) Dhx1 (talk) 02:39, 8 March 2018 (UTC) Jneubert (talk) 13:35, 19 June 2018 (UTC) User:Malore Malore (talk) 15:59, 24 August 2018 (UTC) Vladimir Alexiev (talk) 06:33, 10 September 2018 (UTC) Jose Emilio Labra Gayo (talk) 19:34, 21 November 2018 (UTC) Spinster 💬 08:45, 18 December 2018 (UTC) Egon Willighagen (talk) 07:43, 5 March 2019 (UTC) EricP (talk) 10:44, 14 March 2019 (UTC) Tombakerii (talk) 15:03, 17 May 2019 (UTC) Maxlath (talk) 13:26, 19 May 2019 (UTC) Jumtist (talk) 13:29, 19 May 2019 (UTC) SilentSpike (talk) 13:48, 19 May 2019 (UTC) MisterSynergy (talk) 19:17, 19 May 2019 (UTC) --Harmonia Amanda (talk) 06:32, 20 May 2019 (UTC) Salgo60 (talk) 09:07, 20 May 2019 (UTC) Ivanhercaz (Talk) 15:38, 20 May 2019 (UTC) Andrew Su (talk) 15:50, 20 May 2019 (UTC) Mlemusrojas (talk) 16:50, 21 May 2019 (UTC) Dani Fernandez 14:11, 23 May 2019 (UTC) PKM (talk) 02:43, 29 May 2019 (UTC) Sannita - not just another it.wiki sysop 09:47, 2 June 2019 (UTC) Infomuse (talk) 22:37, 3 June 2019 (UTC) Buccalon (talk) 17:42, 18 June 2019 (UTC) author  TomT0m / talk page 11:52, 30 June 2019 (UTC) Ecritures (talk) 20:08, 15 July 2019 (UTC) Fuzheado (talk) 17:03, 10 July 2019 (UTC) Iovka Boneva (Iovka) Csisc (talk) 20:43, 24 August 2019 (UTC) Fuzheado (talk) 18:01, 23 October 2019 (UTC) Ash Crow (talk) Pdehaye (talk) 22:13, 27 October 2019 (UTC) Tinker Bell 20:18, 1 November 2019 (UTC) So9q (talk) 06:26, 13 November 2019 (UTC) ElanHR (talk) 21:29, 14 November 2019 (UTC) Arybolab (talk) Blue Rasberry (talk) 14:21, 24 November 2019 (UTC) Susanna Ånäs (Susannaanas) (talk)


Pictogram voting comment.svg Notified participants of WikiProject ShEx

Hi folks! I’ve been working on a tool to automatically infer ShEx schemas from Wikidata items, and a first version of the tool is now available at toolforge:wd-shex-infer (documentation). I would be very thankful if you could try it out and let me know how it works for you, preferably within the next two weeks (the tool will stay available after that, but eventually I’ll have to write and hand in my thesis). Let me know if you have any questions! --Lucas Werkmeister (talk) 12:33, 16 August 2018 (UTC)

Some initial observations: This tool is a great idea and could potentially become very useful — thanks! It's understandable that only a small number of jobs can be run at any time, but it would be nice to be able to submit jobs into a queue if they cannot be run immediately. The tool tips when exploring the ShEx results are helpful. I haven't seen references covered in the ShEx output, but it would be handy to be able to run some jobs specifically to explore the data model used for references on items of particular types. --Daniel Mietchen (talk) 02:29, 18 August 2018 (UTC)
@Daniel Mietchen: thanks! I’ll think about adding a job queue, depending on how many people use the tool. And currently, qualifiers and references are ignored, yes – I’m afraid that the way RDF2Graph works doesn’t really work well with them (it heavily relies on “instance of” and “subclass of” relations, so it would see all statement and reference nodes as equivalent, since they all have the type wikibase:Statement/wikibase:Reference). It might be possible to fix that, but I don’t think I’ll have time for that before my thesis is done. --Lucas Werkmeister (talk) 12:14, 22 August 2018 (UTC)
Friendly reminder that the next few days would be an especially helpful time for feedback :) it should also be possible to run two jobs at once now. Please let me know if there are any problems! --Lucas Werkmeister (talk) 17:56, 28 August 2018 (UTC)
I’ve also updated the tool to fix several problems with the simplification step, so now the schemas should look much nicer. For example, compare the shape for human (Q5) between job #11 and job #29 (both for “films that won ten or more Oscars”): five target classes for nominated for (P1411) were merged into one (award (Q618779)), as were nine target classes for award received (P166); eight target classes for country of citizenship (P27) were merged into two (political territorial entity (Q1048835) and political system (Q28108) – that second one is probably a bug in the data); and so on. You might even see completely new predicates be mentioned, because the tool drops any predicate with more than ten possible target classes (rationale: that’s pointless noise), so predicates which would previously have been dropped might now be included due to the target classes being merged. If you were dissatisfied with the schemas before, perhaps take another look? :) --Lucas Werkmeister (talk) 15:49, 6 September 2018 (UTC)

You can now try Shape Expressions on a test system[edit]

Andra Waagmeester Andrawaag (talk) 19:33, 30 January 2018 (UTC) YULdigitalpreservation (talk) 13:32, 6 February 2018 (UTC) Daniel Mietchen (talk) 01:52, 7 February 2018 (UTC) Finn Årup Nielsen (fnielsen) (talk) 13:55, 13 February 2018 (UTC) Lucas Werkmeister (talk) 12:34, 14 February 2018 (UTC) John Samuel 20:31, 26 February 2018 (UTC) Dhx1 (talk) 02:39, 8 March 2018 (UTC) Jneubert (talk) 13:35, 19 June 2018 (UTC) User:Malore Malore (talk) 15:59, 24 August 2018 (UTC) Vladimir Alexiev (talk) 06:33, 10 September 2018 (UTC) Jose Emilio Labra Gayo (talk) 19:34, 21 November 2018 (UTC) Spinster 💬 08:45, 18 December 2018 (UTC) Egon Willighagen (talk) 07:43, 5 March 2019 (UTC) EricP (talk) 10:44, 14 March 2019 (UTC) Tombakerii (talk) 15:03, 17 May 2019 (UTC) Maxlath (talk) 13:26, 19 May 2019 (UTC) Jumtist (talk) 13:29, 19 May 2019 (UTC) SilentSpike (talk) 13:48, 19 May 2019 (UTC) MisterSynergy (talk) 19:17, 19 May 2019 (UTC) --Harmonia Amanda (talk) 06:32, 20 May 2019 (UTC) Salgo60 (talk) 09:07, 20 May 2019 (UTC) Ivanhercaz (Talk) 15:38, 20 May 2019 (UTC) Andrew Su (talk) 15:50, 20 May 2019 (UTC) Mlemusrojas (talk) 16:50, 21 May 2019 (UTC) Dani Fernandez 14:11, 23 May 2019 (UTC) PKM (talk) 02:43, 29 May 2019 (UTC) Sannita - not just another it.wiki sysop 09:47, 2 June 2019 (UTC) Infomuse (talk) 22:37, 3 June 2019 (UTC) Buccalon (talk) 17:42, 18 June 2019 (UTC) author  TomT0m / talk page 11:52, 30 June 2019 (UTC) Ecritures (talk) 20:08, 15 July 2019 (UTC) Fuzheado (talk) 17:03, 10 July 2019 (UTC) Iovka Boneva (Iovka) Csisc (talk) 20:43, 24 August 2019 (UTC) Fuzheado (talk) 18:01, 23 October 2019 (UTC) Ash Crow (talk) Pdehaye (talk) 22:13, 27 October 2019 (UTC) Tinker Bell 20:18, 1 November 2019 (UTC) So9q (talk) 06:26, 13 November 2019 (UTC) ElanHR (talk) 21:29, 14 November 2019 (UTC) Arybolab (talk) Blue Rasberry (talk) 14:21, 24 November 2019 (UTC) Susanna Ånäs (Susannaanas) (talk)


Pictogram voting comment.svg Notified participants of WikiProject ShEx

Hello all,

The Wikidata team started working on support for Schemas, specifically Shape Expressions, to integrate a new extension into Wikidata, in order to store and reuse Schemas.

It’s still in development, but we wanted to share the first results with you, so you can give us early feedback.

On the test system, one can create and edit Schemas. You can see an example Schema here.

Please note that the multilingual labels, descriptions and aliases are not enabled for now, this is the next step we will work on. After that we will work on linking to a tool that allows you to check the Schema against a list of Items.

If you have any questions or remarks at that stage, please let me know by replying to this section :) If you want to create Phabricator tickets, you can use the tag Shape Expressions.

Cheers, Lea Lacroix (WMDE) (talk) 14:13, 26 February 2019 (UTC)

  • Thanks for letting us know. I just tried it out and created O10. YULdigitalpreservation (talk) 19:29, 26 February 2019 (UTC)
  • Sorry for the delayed reply. This is cool! I finally got around to it, but will put my (few) ShEx there. --Egon Willighagen (talk) 09:50, 22 April 2019 (UTC)

Improvements on ShEx test system[edit]

Andra Waagmeester Andrawaag (talk) 19:33, 30 January 2018 (UTC) YULdigitalpreservation (talk) 13:32, 6 February 2018 (UTC) Daniel Mietchen (talk) 01:52, 7 February 2018 (UTC) Finn Årup Nielsen (fnielsen) (talk) 13:55, 13 February 2018 (UTC) Lucas Werkmeister (talk) 12:34, 14 February 2018 (UTC) John Samuel 20:31, 26 February 2018 (UTC) Dhx1 (talk) 02:39, 8 March 2018 (UTC) Jneubert (talk) 13:35, 19 June 2018 (UTC) User:Malore Malore (talk) 15:59, 24 August 2018 (UTC) Vladimir Alexiev (talk) 06:33, 10 September 2018 (UTC) Jose Emilio Labra Gayo (talk) 19:34, 21 November 2018 (UTC) Spinster 💬 08:45, 18 December 2018 (UTC) Egon Willighagen (talk) 07:43, 5 March 2019 (UTC) EricP (talk) 10:44, 14 March 2019 (UTC) Tombakerii (talk) 15:03, 17 May 2019 (UTC) Maxlath (talk) 13:26, 19 May 2019 (UTC) Jumtist (talk) 13:29, 19 May 2019 (UTC) SilentSpike (talk) 13:48, 19 May 2019 (UTC) MisterSynergy (talk) 19:17, 19 May 2019 (UTC) --Harmonia Amanda (talk) 06:32, 20 May 2019 (UTC) Salgo60 (talk) 09:07, 20 May 2019 (UTC) Ivanhercaz (Talk) 15:38, 20 May 2019 (UTC) Andrew Su (talk) 15:50, 20 May 2019 (UTC) Mlemusrojas (talk) 16:50, 21 May 2019 (UTC) Dani Fernandez 14:11, 23 May 2019 (UTC) PKM (talk) 02:43, 29 May 2019 (UTC) Sannita - not just another it.wiki sysop 09:47, 2 June 2019 (UTC) Infomuse (talk) 22:37, 3 June 2019 (UTC) Buccalon (talk) 17:42, 18 June 2019 (UTC) author  TomT0m / talk page 11:52, 30 June 2019 (UTC) Ecritures (talk) 20:08, 15 July 2019 (UTC) Fuzheado (talk) 17:03, 10 July 2019 (UTC) Iovka Boneva (Iovka) Csisc (talk) 20:43, 24 August 2019 (UTC) Fuzheado (talk) 18:01, 23 October 2019 (UTC) Ash Crow (talk) Pdehaye (talk) 22:13, 27 October 2019 (UTC) Tinker Bell 20:18, 1 November 2019 (UTC) So9q (talk) 06:26, 13 November 2019 (UTC) ElanHR (talk) 21:29, 14 November 2019 (UTC) Arybolab (talk) Blue Rasberry (talk) 14:21, 24 November 2019 (UTC) Susanna Ånäs (Susannaanas) (talk)


Pictogram voting comment.svg Notified participants of WikiProject ShEx

Hello all,

Our developers keep working hard on Shape Expressions, and we would love to have your feedback on the current version :)

Here's what has been improved recently:

  • the "termbox" area of the page now displays several languages
  • if you switch your interface from English to a language that has label filled, the title of the page will change accordingly
  • if you want to add a label/description in a new languages, two options are possible: you can switch your interface in this new language, and an editable line will appear in the table, or you can edit directly the URL to access the special page, eg. https://wikidata-shex.wmflabs.org/wiki/Special:SetSchemaLabelDescriptionAliases/O2/fr
  • there is no more edit button on the top of the page, but the different sections are independantly editable
  • A new special page, Special:SchemaText, provides the raw text of the Schema in an external file. Example: https://wikidata-shex.wmflabs.org/wiki/Special:SchemaText/O2

And here's what is coming next:

  • the "edit" buttons will be translated in the language of your interface
  • we will add a button to check the schema in the validator tool

Feel free to try the interface on the test system, create new schemas, play around. If you find any issue, or if there is a feature/improvement that you would like to add, please let me know :)

Cheers, Lea Lacroix (WMDE) (talk) 09:14, 14 March 2019 (UTC)

  • One thing that comes to mind is to be able to indicate what items the expression should run on. At this moment I am not entirely sure how to 'run' my ShEx on Wikidata. --Egon Willighagen (talk) 09:51, 22 April 2019 (UTC)

Please explain[edit]

What is the purpose and how will it affect the existing structure that is opaque. That can not be explained to me (I have asked repeatedly). What are the material benefits of this approach? Thanks, GerardM (talk) 16:45, 16 March 2019 (UTC)

Community requirements for data integrity[edit]

@GerardM: Hi, for completeness and to make sure we're addressing your issues, could you link to your previous requests for explanation?

While not all Wikidata communities require or even desire validation, it is essential for some of the more complex ones, e.g. GeneWiki (c.f. GeneWiki grant proposal). Such validation can be hand-rolled, but having a standard schema language offers obvious advantages in terms of tooling, completeness and ease of maintenance. Compiling even a simple ShEx schema to SPARQL produces a 10-100x explosion in line noise and scripting something with conjunction of JSON path expressions would require tooling investment and would require maintenance of a corpus of rules to enforce cardinality, data type consistency and structural coherence. It would be possible to invent a Wikidata-specific schema language but it would lack the tooling support that ShEx offers (validators in five languages, form-generation, import from UML/XMI, etc).

I've witnessed many publicly-curated databases lose relevance as their data rotted over time or changed structure so that potential users gave up trying to track it. Open PHACTS was founded specifically to provide integrity and consistency to Linked Data. Domain-specific databases typically have greater institutional investment because they offer integrity and consistency backed by schemas (e.g. UniProt, whose RDF structure reflects a conventional SQL (DDL) schema for genes and proteins). General knowledge stores have to add schema validation because their native schema is not domain-specific but instead one of generalized assertions, which can express incoherent data structures as easily as coherent ones.

Of course not all communities demand validation, but I believe that the offer of testable contracts to ensure the longevity and institutional investment in Wikidata more than justifies this effort.

--EricP (talk) 07:00, 18 March 2019 (UTC)

When technology is introduced that enforces particular behaviour, it is all too easy to use the same technology elsewhere when at first glance a similar situation exists. So you have been abstract in your answer and it does not satisfy. I am familiair with SwissProt/UniProt from my Wikiprotein days. I know that Wikidata is not as good as Wikiprotein used to be. The quality of the data is not the issue, the issue is that a schema enforces. It follows that a certain "completeness" will be enforced and that is not necessarily a good thing. What I learned at Wikiprotein is how vital it is that people include information that is valid but not necessary complete.
In conclusion, what is it EXACTLY what you aim to achieve/enforce? Thanks, GerardM (talk) 11:11, 18 March 2019 (UTC)
ShEx or any schema language is not about enforcing, it is more instrumental to checking for conformance. As a data-consumer I want to be able to check data consistency according to relevant data-models. Relevant to me, not necessarily to you. There are many case where even within a single application multiple schema's could apply, depending on the use case. As you say it is crucial that people include data that is valid, not necessarily complete. There is no intention to enforce, only to be able to check the validity. --Andrawaag (talk) 11:40, 18 March 2019 (UTC)
You asked EXACTLY what we aim to enforce. It would be tedious to enumerate everything but as an example, in Gene Wiki we want to know when an item on a protein doesn't have properties related to genes (e.g. chromosomal location) AND that a genomic build is missing as a qualifier to the statement on the gene location, making the statement non-sensical. When these inconsistencies occur having flags indicating these inconstancies being part of the workflow, tremendously helps in curating protein and gene information. Early prototypes of this system have already help me fixing errors. --Andrawaag (talk) 12:10, 18 March 2019 (UTC)
That makes perfect sense. So in conclusion the intention is to signal structural issues in order to help people insert sensible information and to use it as a template to query those records that fail a "sanity"check. Thanks, GerardM (talk) 15:16, 18 March 2019 (UTC)

Update documentation[edit]

Hello dear ShEx enthusiasts!

Because we will release Schemas on Wikidata very soon, I'm currently reviewing the existing documentation. When I announce it, I expect a lot of people in the Wikidata community to wonder "what is it exactly? how can write my own?"

The main links I'll redirect people to is your Wikiproject page and Wikidata:WikiProject ShEx/How to get started?. Is this second page still up to date from your point of view?

I think that now would be a good time to give a bit of polish to the presentation of shape expressions. From the development team side, will add technical documentation about the new extension and data type.

If you have any question or wish, feel free to ping me. Cheers, Lea Lacroix (WMDE) (talk) 15:11, 23 April 2019 (UTC)

Shape Expressions arrive on Wikidata on May 28th[edit]

See full announcement on the Project Chat :)

Thanks a lot to all of you who have been involved in discussing, suggesting improvements, testing the feature! Lea Lacroix (WMDE) (talk) 13:30, 19 May 2019 (UTC) Andra Waagmeester Andrawaag (talk) 19:33, 30 January 2018 (UTC) YULdigitalpreservation (talk) 13:32, 6 February 2018 (UTC) Daniel Mietchen (talk) 01:52, 7 February 2018 (UTC) Finn Årup Nielsen (fnielsen) (talk) 13:55, 13 February 2018 (UTC) Lucas Werkmeister (talk) 12:34, 14 February 2018 (UTC) John Samuel 20:31, 26 February 2018 (UTC) Dhx1 (talk) 02:39, 8 March 2018 (UTC) Jneubert (talk) 13:35, 19 June 2018 (UTC) User:Malore Malore (talk) 15:59, 24 August 2018 (UTC) Vladimir Alexiev (talk) 06:33, 10 September 2018 (UTC) Jose Emilio Labra Gayo (talk) 19:34, 21 November 2018 (UTC) Spinster 💬 08:45, 18 December 2018 (UTC) Egon Willighagen (talk) 07:43, 5 March 2019 (UTC) EricP (talk) 10:44, 14 March 2019 (UTC) Tombakerii (talk) 15:03, 17 May 2019 (UTC) Maxlath (talk) 13:26, 19 May 2019 (UTC) Jumtist (talk) 13:29, 19 May 2019 (UTC) SilentSpike (talk) 13:48, 19 May 2019 (UTC) MisterSynergy (talk) 19:17, 19 May 2019 (UTC) --Harmonia Amanda (talk) 06:32, 20 May 2019 (UTC) Salgo60 (talk) 09:07, 20 May 2019 (UTC) Ivanhercaz (Talk) 15:38, 20 May 2019 (UTC) Andrew Su (talk) 15:50, 20 May 2019 (UTC) Mlemusrojas (talk) 16:50, 21 May 2019 (UTC) Dani Fernandez 14:11, 23 May 2019 (UTC) PKM (talk) 02:43, 29 May 2019 (UTC) Sannita - not just another it.wiki sysop 09:47, 2 June 2019 (UTC) Infomuse (talk) 22:37, 3 June 2019 (UTC) Buccalon (talk) 17:42, 18 June 2019 (UTC) author  TomT0m / talk page 11:52, 30 June 2019 (UTC) Ecritures (talk) 20:08, 15 July 2019 (UTC) Fuzheado (talk) 17:03, 10 July 2019 (UTC) Iovka Boneva (Iovka) Csisc (talk) 20:43, 24 August 2019 (UTC) Fuzheado (talk) 18:01, 23 October 2019 (UTC) Ash Crow (talk) Pdehaye (talk) 22:13, 27 October 2019 (UTC) Tinker Bell 20:18, 1 November 2019 (UTC) So9q (talk) 06:26, 13 November 2019 (UTC) ElanHR (talk) 21:29, 14 November 2019 (UTC) Arybolab (talk) Blue Rasberry (talk) 14:21, 24 November 2019 (UTC) Susanna Ånäs (Susannaanas) (talk)


Pictogram voting comment.svg Notified participants of WikiProject ShEx

Hello all,
As announced here, we just released shape expressions on Wikidata. You can for example have a look at E10, the shape for human, or create a new EntitySchema.
A few useful links:
If you have any question or encounter issues, feel free to ping me. Cheers, Lea Lacroix (WMDE) (talk) 16:07, 28 May 2019 (UTC)
Indeed it's CC0. Thanks for the reminder! I created a ticket. Lea Lacroix (WMDE) (talk) 07:18, 29 May 2019 (UTC)

Are the following validations possible?[edit]

1. Ensure that at least one statement for a given property (where multiple statements exist) has a value in a specified value set. If other statements exist for the property, ignore them. For example, validate that an item has at least the statement instance of (P31) sovereign state (Q3624078), but may also have other instance of (P31) statements that should be ignored.

- 1. Yes, the keyword EXTRA says that other values of the property may appear. This is common for P31. This example shows a schema with a simple value set [<Qx> <Qy>]. (In many schemas, that's a value set of 1 element.) <Q2> fails <WithoutExtra> because it has an extra P31 (outside the value set).but it passes <WithExtra>. I added a <Q3> which has two P31's within the value set. There you don't need an EXTRA, you need instead to increase the number of expected P31s matching the value set. I added + which is a shorthand for {1,}, i.e min number of 1, max number unlimited. --EricP (talk)

2. Extract data on linked Wikidata items using EXTERNAL (?) or some other technique, allowing a country (P17) statement to be validated to ensure the linked item has a statement instance of (P31) sovereign state (Q3624078).

- 2. Yes, but you don't need EXTERNAL. If I understand the question, you just want your constraints to link to another resource in the wikidata world. I created a shape for national flags as an example. It has a constraint ```wdt:P17 { wdt:P31 [wd:Q3624078] }``` (which 90% of flags fail, but...) to say that the NationalFlag must have a country with a given type. --EricP (talk)

Dhx1 (talk) 18:35, 28 May 2019 (UTC)

Validate in Blazegraph/query server ?[edit]

It would be interesting if these schemes could be used directly on query server, i.e. filter for items that match, check if items match, list errors. --- Jura 10:38, 29 May 2019 (UTC)

Running validation with API access (i.e. getStatements()) would greatly accelerate validation and reduce parsing and serialization effort on the query server. ---EricP (talk)

Structure e-entities ?[edit]

There are a few essential, but secondary elements sometimes included on entities:

  • queries of items that could be validated
  • lists of prefixes

I think the first could easily go into the long announced "query"-namespace. The second could probably be assumed in the configuration of whatever tool one uses, at least if they are WD prefixes. --- Jura 10:38, 29 May 2019 (UTC)

  • Associating an item with each shape could help link the queries. --- Jura 09:36, 30 May 2019 (UTC)

links between entities schema[edit]

I couldn't figure out the way to refer from an entity schema to another: for instance, I would like to be able to write from E36 entry point something like wdt:P629 @<someprefix:E35> Is that possible? Is is the right pattern to have several EntitySchema to describe different shapes of a schema? pinging the ShExperts ;) @Andrawaag, YULdigitalpreservation, Jelabra, Tombakerii: -- Maxlath (talk) 15:35, 29 May 2019 (UTC)

I am definitely a ShEx beginner as well, but I have found the import command as described in [1] and [2] which looks promising. You can access the raw ShEx schema code via Special:EntitySchemaText (e.g. Special:EntitySchemaText/E10).
Unfortunately, I don't get it to work in the shex-simple tool, and I am not sure whether this is due to my poor ShEx skills, or some bug in the tool (error message is: "failed to create validator: loadImports@https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/browser/shex-webapp-webpack.js:53845:9 …"). —MisterSynergy (talk) 09:31, 30 May 2019 (UTC)
I'll dive into this. @MisterSynergy, can you pass me an experiment that failed and I'll see if I can tweak it to make it succeed? (One requirement is of IMPORT <XXX> is that XXX returns the schema without any HTML around it; also that we don't get defeated by CORS issues which require administration beyond my fingertips. — EricP (talk)
For instance this one (sorry for the non-clickable link, there are several unmasked characters which I don't want to change in order not to break the link):
  • https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/doc/shex-simple.html?schema=PREFIX%20rdf%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX%20%3A%20%3Chttps%3A%2F%2Fwww.example.org%2F%23%3E%0A%0Aimport%20%3Chttps%3A%2F%2Fwww.wikidata.org%2Fwiki%2FSpecial%3AEntitySchemaText%2FE48%3E%0Astart%20%3D%20%40%3Asportsperson%0A%0A%3Asportsperson%20EXTRA%20wdt%3AP106%20{%0A%20%20wdt%3AP106%20[%20wd%3AQ2066131%20]%3B%0A%23%20wdt%3AP22%20%40%3Chuman%3E%0A}&data=Endpoint%3A%20https%3A%2F%2Fquery.wikidata.org%2Fsparql&shape-map=SPARQL%20%27%27%27SELECT%20DISTINCT%20%3Fid%20WHERE%20{%20%3Fid%20wdt%3AP106%20wd%3AQ2066131%3B%20wdt%3AP22%20[]%20}%20LIMIT%2010%27%27%27@START&interface=human&regexpEngine=threaded-val-nerr
It uses EntitySchema:E48 via Special:EntitySchemaText/E48 (raw shex without any HTML around—just click on it). I already tried several things, including this older version of E48 with prefixes. Note that E48 does not have a "start" command, as required for imported shape expressions. In the simple-shex tool, you'll see that the line that would actually make use of the imported shex is commented because it does not work anyways.
The error message displayed in Google Chrome is failed to create validator TypeError: Cannot read property 'keepImports' of undefined at loadImports (https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/browser/shex-webapp-webpack.js:53845:23). Sounds like a Javascript issue, but I am not very experienced with that… Thanks for investigating, —MisterSynergy (talk) 15:28, 30 May 2019 (UTC)
One engineering decision is whether that import would be just textual, like C's *#include*, or whether the prefixes (and inclusion there-of) should appear in the JSON (ShExJ) and RDF (ShExR) versions of the schema. You may want to raise a language issue with the tag "enhancement". — EricP (talk)

What to do with duplicate schemas?[edit]

Andra Waagmeester Andrawaag (talk) 19:33, 30 January 2018 (UTC) YULdigitalpreservation (talk) 13:32, 6 February 2018 (UTC) Daniel Mietchen (talk) 01:52, 7 February 2018 (UTC) Finn Årup Nielsen (fnielsen) (talk) 13:55, 13 February 2018 (UTC) Lucas Werkmeister (talk) 12:34, 14 February 2018 (UTC) John Samuel 20:31, 26 February 2018 (UTC) Dhx1 (talk) 02:39, 8 March 2018 (UTC) Jneubert (talk) 13:35, 19 June 2018 (UTC) User:Malore Malore (talk) 15:59, 24 August 2018 (UTC) Vladimir Alexiev (talk) 06:33, 10 September 2018 (UTC) Jose Emilio Labra Gayo (talk) 19:34, 21 November 2018 (UTC) Spinster 💬 08:45, 18 December 2018 (UTC) Egon Willighagen (talk) 07:43, 5 March 2019 (UTC) EricP (talk) 10:44, 14 March 2019 (UTC) Tombakerii (talk) 15:03, 17 May 2019 (UTC) Maxlath (talk) 13:26, 19 May 2019 (UTC) Jumtist (talk) 13:29, 19 May 2019 (UTC) SilentSpike (talk) 13:48, 19 May 2019 (UTC) MisterSynergy (talk) 19:17, 19 May 2019 (UTC) --Harmonia Amanda (talk) 06:32, 20 May 2019 (UTC) Salgo60 (talk) 09:07, 20 May 2019 (UTC) Ivanhercaz (Talk) 15:38, 20 May 2019 (UTC) Andrew Su (talk) 15:50, 20 May 2019 (UTC) Mlemusrojas (talk) 16:50, 21 May 2019 (UTC) Dani Fernandez 14:11, 23 May 2019 (UTC) PKM (talk) 02:43, 29 May 2019 (UTC) Sannita - not just another it.wiki sysop 09:47, 2 June 2019 (UTC) Infomuse (talk) 22:37, 3 June 2019 (UTC) Buccalon (talk) 17:42, 18 June 2019 (UTC) author  TomT0m / talk page 11:52, 30 June 2019 (UTC) Ecritures (talk) 20:08, 15 July 2019 (UTC) Fuzheado (talk) 17:03, 10 July 2019 (UTC) Iovka Boneva (Iovka) Csisc (talk) 20:43, 24 August 2019 (UTC) Fuzheado (talk) 18:01, 23 October 2019 (UTC) Ash Crow (talk) Pdehaye (talk) 22:13, 27 October 2019 (UTC) Tinker Bell 20:18, 1 November 2019 (UTC) So9q (talk) 06:26, 13 November 2019 (UTC) ElanHR (talk) 21:29, 14 November 2019 (UTC) Arybolab (talk) Blue Rasberry (talk) 14:21, 24 November 2019 (UTC) Susanna Ånäs (Susannaanas) (talk)


Pictogram voting comment.svg Notified participants of WikiProject ShEx

Hi all, since people are already working to do their own schemas, and since we still didn't set up a list of all existing ones, there are already a couple of them who are basically the same thing, like E10, E14 and E48. What do we do in this case? Do we cancel them or "reuse" them? --Sannita - not just another it.wiki sysop 14:32, 11 June 2019 (UTC)

Hello,
Not directly answering your question, I just wanted to point to a few tickets - we will continue improving the software in the future.
Cheers, Lea Lacroix (WMDE) (talk) 08:58, 13 June 2019 (UTC)
Some more input: I do not think that we should be concerned about duplicates at this point. ShEx is a relatively new functionality and there is quite a lot of dev work going on, as well as the community needs to become familiar with it. According to [3], there are not that many EntitySchemas created until now. Later, we probably want to either merge duplicates (i.e. redirect the E-numbers), or simply allow "duplicated" EntitySchemas. Reuse does not seem to be a good idea, though. --MisterSynergy (talk) 09:24, 13 June 2019 (UTC)

CheckShex UserScript[edit]

I thought this project might be interested in a new userscript named CheckShex. It adds a field to items, properties, lexemes where you can enter an entitySchema and it will return whether it passes or fails. It also adds a field to entitySchemas, where you can do the reverse. The userscript can be installed to your common.js from User:Teester/CheckShex.js.

The userscript is backed by an api based on PyShEx (Q51672520). The api is located at https://tools.wmflabs.org/pyshexy/api and details about its use are at https://tools-static.wmflabs.org/pyshexy/. Teester (talk) 11:56, 22 June 2019 (UTC)

Thanks for this great tool! Sometimes however, I get strange results: Checking Antifaschistisches Pressearchiv und Bildungszentrum Berlin (Q575202) against "E94", I get "Pass Fail" as message. When I hit "Check" again, I get "Fail". This behaviour seems not to be reproducible, but I encountered it once for 20th century press archives (Q36948990), too. A hint may be that hitting "Check" again on an item page after "Pass" always results in "Fail". From E94, both items are validated consistently as passing. Jneubert (talk) 09:55, 20 July 2019 (UTC)
Thanks. There was a bug in the userscript where when you hit check more than once the schema would be checked against itself rather than the item being checked against the schema. I wonder if the "Pass, Fail"" behaviour is from clicking "Check" a second time before the check is complete and running into the bug?
Looking at the items, Antifaschistisches Pressearchiv und Bildungszentrum Berlin (Q575202) currently fails against E94 because of a missing parent organization (P749), while 20th century press archives (Q36948990) currently passes. I get this result when using both the user script and the ShEx2 Validator. For the ShEx2 Validator, a query like this gets you just that item to validate:
SELECT ?item WHERE {BIND(wd:Q36948990 as ?item)} LIMIT 1
Try it!
Let me know if there are any other bugs or problems. Teester (talk) 14:21, 20 July 2019 (UTC)
A big sorry - I'm currently figuring out possible workflows, and indeed have made E94 more strict, which causes it to fail with Antifaschistisches Pressearchiv und Bildungszentrum Berlin (Q575202), while it passes the new relaxed E95. This messed up the test case - sorry again!
Now in multiple tests with some arbitrary clicking, I was not able to reproduce a case with "Pass Fail", so I suppose this is gone together with your bug fix, which also worked consistently well. Thank you for the quick fix! --Jneubert (talk) 08:01, 21 July 2019 (UTC)
May I suggest a possible extension of the script? The API already returns the reason for failing (e.g., [4]). So it should be possible to show it to the user on request (with a popup/mouse-over perhaps, because the messages do not look nice, but are helpful nonetheless). --Jneubert (talk) 08:10, 21 July 2019 (UTC)
Great idea. I've updated the user script so that now it shows some error information on failure. Now, if there's a missing or incorrect property in the response, the property number is shown beside the Fail message. Additionally, the raw error response is available on mouse over of the fail message. Teester (talk) 11:03, 23 July 2019 (UTC)
This is fantastic - thank you so much. --Jneubert (talk) 14:37, 23 July 2019 (UTC)
While adding the tool to the How to get started ... page, another possible improvement came to mind: On the item page, a tiny "schema" link, right of the validating result, would make it super-easy to navigate to the selected schema. --Jneubert (talk) 17:57, 23 July 2019 (UTC)

Add saved queries to EntitySchema entries?[edit]

The "check entities against this Schema" link on the schema pages is a great thing. However, it requires newbies and experts alike to write a query from scratch, which is tedious. Some Schema authors are working around this by embedding example query code in the schema text as comment - which helps, but looks a bit messy, and still needs manual copy+paste for transfer to the query field.

So it would be great if we could save a query - or even better, muliple named queries - with the schema. The code to load queries and allow for user selection is already in place (see ShEx2 on Toolforge) with the "dataLabel" and "queryMap" parameters in the manifest file (though perhaps not yet as http request query parameter).

On the Wikidata/Wikibase side, I wonder if setting the Wikidata SPARQL query equivalent (P3921) property could be enabled for EntitySchema entries. Together with named as (P1810) qualifiers, that would allow for multiple queries to be saved with each schema. --Jneubert (talk) 09:31, 21 July 2019 (UTC)

My idea was to re-use the property definition at EntitySchema:E123 in order to use it there directly (formatted as a link to ShEx2), not to add the property to an item about the schema. --Jneubert (talk) 09:46, 21 July 2019 (UTC)
This is currently not supported. Once an item is associated with a schema, you should be able to load its content on the schema page with LUA. --- Jura 10:09, 21 July 2019 (UTC)

Comparison between ShEx and constraints[edit]

What’s possible with ShEx that is not with constraints and vice versa ? What I got so far is:

  • Constraints are tight to a property, shapes are « free » to be checked against any item and reused
  • Constraints are somewhat easier to edit textually, more efficient
  • Constraints are automatically checked by Mediawiki.
  • Shapes are more powerful, for example it’s possible to express something like any property that is not authorized is impossible

Anything else/wrong ? It’s unclear to me how type shape constraints can be dealt with on Wikidata, as « rdf:type » is irrelevant in Wikidata items. Wikibase has domain and range constraints, I’m not sure this can be dealt with with shape expressions as it seems there is no notion analog to Sparql PropertyPath’s in shex. author  TomT0m / talk page 19:42, 21 July 2019 (UTC)

Correction, it’s definitely possible to express paths, my bad (this is used on the example shape for file formats, and for example showed in the 13th slide of this comparison between shex and shacl). author  TomT0m / talk page

Could we generate constraint from shapes and vice-versa ?

author  TomT0m / talk page 19:42, 21 July 2019 (UTC)

Rename this project[edit]

The project should be about entity schemas, not about ShEx. The latter is just the technical language to write entity schemas in. Anybody against renaming the project? -- JakobVoss (talk) 08:42, 25 October 2019 (UTC)

  • Symbol support vote.svg Support; maybe we should call it just "WikiProject Schemas"? —MisterSynergy (talk) 09:07, 25 October 2019 (UTC)
Ok, I was bold, renamed it and created some additional redirects such as Help:Schemas to avoid creation of too many separated pages. -- JakobVoss (talk) 12:44, 26 October 2019 (UTC)

Lack of help[edit]

There is also a lack of help. No mention of schemas in the Help namespace so far. There should be Help:Schemas just like Help:Constraints. -- JakobVoss (talk) 08:48, 25 October 2019 (UTC)

  • There is Wikidata:WikiProject ShEx/How to get started? as the only subpage of this project at this point, and it clearly is "work in progress". We definitely need to collect some more experience with Schemas in Wikidata in order to come up with a helpful help page. —MisterSynergy (talk) 09:10, 25 October 2019 (UTC)
In my opinion the technical references, links to standards and implementations should be removed. For an overview about SheX in general there is en:ShEx. This page in contrast should focus of use of SheX in/for Wikidata. -- JakobVoss (talk) 12:46, 26 October 2019 (UTC)

Request a Schema page[edit]

Schemas are still hard for people for various reasons. We had the same problem with queries and one thing that beautifully helped was the Request a Query page. There anyone who doesn't know how to write sparql can ask for help from people who can. I think a similar Request a Schema page could be super helpful to get more wiki projects to adopt Schemas. Thoughts? --Lydia Pintscher (WMDE) (talk) 13:27, 31 October 2019 (UTC)

Symbol support vote.svg Support Currently, there are only around 140 entity schemas. This number may be possibly improved with the creation of a dedicated page for schema related questions. John Samuel (talk) 13:32, 31 October 2019 (UTC)
Symbol support vote.svg Support Fabulous idea. Do we have people willing to build schemas on request? - PKM (talk) 03:13, 15 November 2019 (UTC)

Human readable schemas[edit]

One of the biggest problems with schemas right now is that they are difficult to understand without sufficient technical knowledge. But it seems to me that it should be possible to translate a schema into human readable language without too much difficulty, for the most part.

For example, if my understanding of shex is correct, currently E10 could be translated as follows:

Schema Translation
 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

start = @<human>

<human> EXTRA wdt:P31 {
  wdt:P31 [wd:Q5];
  wdt:P21 [wd:Q6581097 wd:Q6581072 wd:Q1097630 wd:Q1052281 wd:Q2449503 wd:Q48270]?;   # gender
  wdt:P19 .;                     # place of birth
  wdt:P569 . + ;                 # date of birth
  wdt:P735 . * ;                 # given name
  wdt:P734 . * ;                 # family name
  wdt:P106 . * ;                 # occupation
  wdt:P27 @<country> *;  # country of citizenship
  rdfs:label rdf:langString+;
}

<country> EXTRA wdt:P31 {
  wdt:P31 [wd:Q6256 wd:Q3024240 wd:Q3624078] +;
}
  • start with <human>

I could see this sort of thing being useful as part of a schema's talk page, similar to how property's talk pages contain a template containing useful information about a property and its constraints. Does anyone know of a service which will translate a schema into human readable language or vice versa? Teester (talk) 13:46, 16 November 2019 (UTC)

Since there seems to be nothing that can translate schemas into human readable language, I've put something together at https://tools-static.wmflabs.org/shextranslator/ Any feedback would be appreciated. Teester (talk) 12:23, 23 November 2019 (UTC)
  • Schemas have great potential to be come a good tool, but, in its present implementation, I don't think we can or should expect from users to rely on them as a primary mean of understanding which properties to add or what statements to fix.
  • A human readable version should always be outlined on a WikiProject page or with property constraints. --- Jura 12:42, 23 November 2019 (UTC)

PyShexy and sparql query[edit]

https://tools-static.wmflabs.org/pyshexy/ Have anyone figured out a way to get it to work with a sparql query? I tried hard but failed, I get HTTP 500 error. Example: query, pyshexy url--So9q (talk) 23:33, 25 November 2019 (UTC)