User:M.alten.tue

From Wikidata
Jump to navigation Jump to search

About Me[edit]

I am a master student at the Eindhoven University of Technology. I have a focus in databases, mostly relational but more recently also graph based databases like WikiData. Currently I am doing my thesis on improving the UI of ShEx validators, and specifically the ShEx Simple Online Validator used on Wikidata. The goal of the project is to make it easier to use the validator to change data to conform to existing schemas. This fits within the existing vision of Wikidata to encourage high data quality through using schemas to encourage standardization.

I live in the Netherlands, and my timezone is Central Europe (CET) If you want to contact me outside Wikidata, mail to m.v.alten@student.tue.nl

The Tabular Validator[edit]

I have recently released a new version of the validator used for schema validation in Wikidata. The main change is a new look for the validation reports that the validator generates. Some features are still a work in progress and some bugs and issues remain, but it can now be used and is hosted on toolforge.

Validator Evaluation[edit]

I invite all Wikidata Users to help me evaluate the new validator in an interview with me. During this interview you will be asked to fix some nonconformances in a Wikibase Cloud instance created for this evaluation while I observe. After this we will talk about your experiences and you will fill out a questionnaire about your experience with the validator tools you used.

If you are interested in helping me with the evaluation, you can sign up for an interview slot in Datumprikker and once you have done so fill in the consent form linked there. For questions about the evaluation, mail me at m.v.alten@student.tue.nl, or ask a question on my talk page

Using the Validator[edit]

The new Validator expects the same input as the old validator. However it is not built on the exact version of the validator Wikidata uses, and there are some new fields. When opening the validator, there are 3 main text fields: The blue schema input on the left, the green data input on the right, and the query input further down. If you are using Wikidata, enter "Endpoint: https://query.wikidata.org/sparql" into the data field. To choose a schema, copy paste the text of the entity schema you want to validate against in the blue field on the left. Then to add your data, in the text field labelled 'Query Map', write a SPARQL query with before it

SPARQL'''

and after it

'''@START

.

To validate your data against this schema, press the validate button. This will generate a table and an error that says "error validating: elt.attr is not a function". The error can be ignored. The table shows a different possible nonconformance on each row.

If you find a nonconformance with the error type "ClosedShapeViolation", "SemActFailure" or "TypeMismatch", you will notice the output you get is not very useful. This is because they aren't properly implemented yet. Please send me an email with what schema and query you used, and a screenshot of the table, and don't fix the nonconformance. This will help me develop the code to handle these types of nonconformances.

Using a different Wikibase Instance[edit]

If you are using a different Wikibase instance, such as a Wikibase Cloud instance, you need to do the following to make the validator query your data rather than Wikidata: - In the schema field, add any prefixes used in your schema, for example "PREFIX p: <https://www.validatortest.wikibase.cloud/prop/>". - In the data field, replace the endpoint with the sparql query endpoint for your wikibase. - In the Query Endpoint and Wikibase Prefix fields, add the query endpoint and the start of the link to your wikibase. Note that unlike the preview, the Wikibase Prefix should not end with a slash (/). Note that Wikibase Cloud prefix links start with https, while Wikidata links start with http.


Future Development[edit]

There are a few main features I still want to implement, and I would like to get rid of some bugs.

Future Features:

  • Show the 'inheritance path' of errors caused by a non-local schema: At the moment all you can see is the item that the validator started checking the conformance off, the shape being checked, and the property and value that are being a problem. I want to show a path of items and properties through which the error propagated to make it more clear on which page you should look for non-local errors.
  • Export to Wikidata table: Currently everyone needs to run the validator for themselves. But for a Wikiproject, it could be useful to have a table with current issues for people to work on, and then not everyone has to be able to use the validator.
  • Implement the features in the Simple Online Validator to not require the text around each SPARQL query.
  • Add an option to show items that validate as correct, so you can feel happy seeing a screen full of green rows.

Known bugs:

  • Sometimes the table is printed twice, horizontally next to each other
  • After rendering the table, there is an issue that results in the message "error validating: elt.attr is not a function"

Main Version History[edit]

2024-05-02[edit]

  • Added an option to show conformant items (in green)
  • Fixed a bug that prevented using endpoints that are not wikidata
  • Removed some old console logging

2024-04-29: First release[edit]

Hosted initial version of validator with table output mode.