Wikidata talk:SPARQL federation input

From Wikidata
Jump to navigation Jump to search

dati.beniculturali.it[edit]

Hi all, I and Mgp17 are part of the IT Department at the National Central Library of Florence (Italy). We have nominate a new endpoint in the project page: http://dati.beniculturali.it/ is the LOD framework of the Italian Ministry of Cultural Heritage. It would be very important for us the support and the insertion into the list of DB for SPARQL Federated Queries, in a relatively short time, because we are preparing for Wikidata-Hackathon at the National Library on November 10th. @Lydia_Pintscher_(WMDE): Thank you! Nonoranonqui (talk) 10:23, 6 October 2017 (UTC)Reply[reply]

Other discussion[edit]

I think the most useful thing would be to link external identifier properties with their SPARQL endpoints, if they have one. I could imagine useful queries to compare for example labels from both sources to verify the identifier relationships, or to pull additional properties like country, website, dates, etc. However, on a quick check just now most of the identifiers I could think of don't seem to have open SPARQL endpoints after all... ArthurPSmith (talk) 18:09, 23 February 2017 (UTC)Reply[reply]

  • +1 to this: identifying which external identifiers are most well used and have SPARQL endpoints in their context. For example, I am imagining interest in tools like Getty's vocabularies: http://vocab.getty.edu/sparql , or OCLC's resources. Sadads (talk) 18:28, 23 February 2017 (UTC)Reply[reply]
  •  Support Sounds good if such endpoint exists for this dataset. --Smalyshev (WMF) (talk) 00:31, 28 February 2017 (UTC)Reply[reply]

The W3C have a handy list: https://www.w3.org/wiki/SparqlEndpoints Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:31, 23 February 2017 (UTC)Reply[reply]

Which doesn't mention Wikidata! I've informed one of my contacts there. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:55, 23 February 2017 (UTC)Reply[reply]
Now fixed. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:54, 23 February 2017 (UTC)Reply[reply]
Really? Placed on top of an alphabetical list. Did not mention the service is a beta version. --Succu (talk) 22:19, 23 February 2017 (UTC)Reply[reply]
There is also a bit of SPARQL endpoint info in Wikidata itself: https://query.wikidata.org/#%23Websites%20with%20OpenAPI%20endpoints%0ASELECT%20%3Fdatabase%20%3FdatabaseLabel%20%3Fvalue%20WHERE%20%7B%0A%20%20%3Fdatabase%20%3Fp%20%3Fwds.%0A%20%20%3Fwds%20%3Fv%20%3Fvalue.%0A%20%20%3FwdP%20wikibase%3AstatementProperty%20%3Fv.%0A%20%20%3FwdP%20wikibase%3Aclaim%20%3Fp.%0A%20%20%3Fwds%20pq%3AP31%20wd%3AQ26261192.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D --Egon Willighagen (talk) 08:09, 25 February 2017 (UTC)Reply[reply]

SPARQL federation the other way round[edit]

Direct support of approved SPARQL endpoints in Wikidata query service is sure the most convenience solution but how about integrating Wikidata query service into your own SPARQL endpoint? This might already be possible but I'm not familiar with setting up SPARQL endpoints. An howto or some hints would help to enable people to do queries across Wikidata and their own data. -- JakobVoss (talk) 08:12, 24 February 2017 (UTC)Reply[reply]

Yes, it is possible - just use https://query.wikidata.org/sparql as SERVICE endpoint in the federated query. --Smalyshev (WMF) (talk) 20:59, 24 February 2017 (UTC)Reply[reply]

Talk page?[edit]

Should we have discussion on the talk page instead? I'd like to move it to the talk page if nobody objects. IMHO it is easier to keep the data part and the general discussion part separate. --Smalyshev (WMF) (talk) 21:15, 24 February 2017 (UTC)Reply[reply]

data.cervantesvirtual.com[edit]

SPARQL endpoint. Terms of Use (CC Zero), BVMC person ID (P2799), Biblioteca Virtual Miguel de Cervantes (Q4903493). Strakhov (talk) 00:41, 8 March 2017 (UTC)Reply[reply]

datos.bne.es[edit]

SPARQL endpoint. Terms of Use (CC Zero), Biblioteca Nacional de España ID (P950), Biblioteca Nacional de España (Q750403). Strakhov (talk) 00:41, 8 March 2017 (UTC)Reply[reply]

Archiving & updating[edit]

I've archived the implemented endpoints into the Wikidata:SPARQL_federation_input/Archive subpage, and changed the page a bit to make nominating easier. Please tell me if something doesn't work. --Smalyshev (WMF) (talk) 19:36, 11 April 2017 (UTC)Reply[reply]

Providing examples?[edit]

Hi all, I found this page while searching if (and how) it was possible to write a sparql query combining different sources. I am happy to learn it is possible, and already implemented for some sources on wikidata endpoint according to the archive page but that would be great if some examples could be added so that Sparql newbies like me can easily see how useful it can be with some example queries. Symac (talk) 06:39, 17 May 2017 (UTC)Reply[reply]

Salut Symac :)
You can find some examples here. It's still something we should improve. I created a dedicated task for the documentation sprint that will happen during the Wikimedia hackathon! Lea Lacroix (WMDE) (talk) 13:54, 17 May 2017 (UTC)Reply[reply]
See also Wikidata:SPARQL_query_service/Federated_queries for one more. Jheald (talk) 09:47, 18 May 2017 (UTC)Reply[reply]

Federated Woes[edit]

I was trying to get WD and DBpedia links for AAC artists, which themselves are linked to ULAN.

1. There are 11731 coreferences from AAC http://yasgui.org/short/Hk1RooxVM to 6679 ULAN artists http://yasgui.org/short/BJ5S3igVG.

2. We can use Wikidata federated SPARQL to get all their links to Wikidata and DBpedia. E.g. here's how to get the links about one

3. Unfortunately the AAC SPARQL endpoint (fuseki) has some trouble understanding Federated querying http://yasgui.org/short/B1rP1hx4z Error 500: Failed when initializing the StAX parsing engine. Fuseki - version 2.4.0 (Build date: 2016-05-10T11:59:39+0000)

4. So I tried a Federated query from Wikidata to AAC. But Wikidata returned error "Service URI http://data.americanartcollaborative.org/sparql is not allowed" I think the reason is Wikidata works with a whitelist of endpoints: https://www.wikidata.org/wiki/Wikidata:SPARQL_federation_input and others are not allowed.

5. So I tried to run this query on the DBpedia endpoint, calling out to AAC and Wikidata. But got error Virtuoso 42000 Error SQ070:SECURITY: Must have select privileges on view DB.DBA.SPARQL_SINV_2.

OpenLink answered: The #DBpedia endpoint specifically disables #SPARQL-FED for security reasons. You can always use our #URIBurner instead

6. I tried the same query on our own http://factforge.net/sparql. And finally success!!

The query is

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX schema: <http://schema.org/>

select ?ulan ?wd ?db {
  {select distinct ?ulan 
    {service <http://data.americanartcollaborative.org/sparql> {?x skos:exactMatch ?ulan}}}
  {service <https://query.wikidata.org/sparql> {
    ?wd <http://www.wikidata.org/prop/direct-normalized/P245> ?ulan.
    ?wp schema:about ?wd}}
  filter(regex(str(?wp),"https?://en.wikipedia.org/wiki/"))
  bind(uri(replace(str(?wp),"https?://en.wikipedia.org/wiki/","http://dbpedia.org/resource/")) as ?db)
}

Question @Smalyshev (WMF): why doesn't Wikidata allow a federated query to http://data.americanartcollaborative.org/sparql? (Does it take a PhD to use federation?)


PS: maybe I should move this to Wikidata:SPARQL query service/Federated queries? --Vladimir Alexiev (talk) 17:15, 8 January 2018 (UTC)Reply[reply]

@Vladimir Alexiev: what's the license of the data in http://data.americanartcollaborative.org/sparql ? Multichill (talk) 17:38, 8 January 2018 (UTC)Reply[reply]
Yes, we have a whitelist. Please fill request at the https://www.wikidata.org/wiki/Wikidata:SPARQL_federation_input page and if license/etc. matches we will add it soon. Smalyshev (WMF) (talk) 19:07, 10 May 2018 (UTC)Reply[reply]

SPARQL 1.1 Protocol parameters[edit]

Please allow to use default-graph-uri and named-graph-uri parameters, at least for DBpedia.
It seems that something like SERVICE <http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org> is not allowed.

I will look into it. Smalyshev (WMF) (talk) 19:56, 10 May 2018 (UTC)Reply[reply]

Query the list of endpoints?[edit]

@Smalyshev (WMF):

Is there a way to do this:

 BIND(uri("http://data.cervantesvirtual.com/openrdf-sesame/repositories/data") as ?endpoint)
 SERVICE ?endpoint  { .. }

Or even

 ?item wdt:P123456 ?endpoint .
 SERVICE ?endpoint  { .. }

It would be interesting if ?endpoint was directly a value from a statement on the item for the service.
--- Jura 21:44, 5 June 2018 (UTC)Reply[reply]


@Smalyshev (WMF): Doesn't the standard allow this format?
--- Jura 09:45, 8 June 2018 (UTC)Reply[reply]

@Jura1: Looking at the grammar: https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#sparqlGrammar I think variables are allowed in SERVICE clause. However, Blazegraph does not seem to expect it there. I am not sure it isn't a bug, in fact. I need to check it. If you submit the phabricator task, that would help track it. Smalyshev (WMF) (talk) 23:42, 8 June 2018 (UTC)Reply[reply]
Thanks for looking into this: T196798. I'm trying to learn how federated queries work.
--- Jura 04:06, 9 June 2018 (UTC)Reply[reply]

Bnb not working and not failing SILENTly[edit]

@Smalyshev (WMF):

The below is listed here, but isn't accessible.

SELECT * { SERVICE SILENT <http://bnb.data.bl.uk/sparql> { SELECT * { ?b1 ?n1 ?b2 } LIMIT 1 }    }  

With "SILENT" shouldn't this fail silently?
--- Jura 06:59, 10 June 2018 (UTC)Reply[reply]

SILENT not working when query times-out[edit]

@Smalyshev (WMF):

Somewhat related, but different than the above:

Queries to <http://tools.wmflabs.org/mw2sparql/sparql> seem to work, but time-out (not data is returned). With "SILENT" shouldn't they fail silently?
--- Jura 08:50, 13 June 2018 (UTC)Reply[reply]

yes, I think they should. Probably a bug. Smalyshev (WMF) (talk) 21:18, 28 July 2018 (UTC)Reply[reply]

Queries with Federation[edit]

Wikidata:Request_a_query#Using_federation .. maybe it's a federation question, maybe not.
--- Jura 13:46, 12 June 2018 (UTC)Reply[reply]

They seem to work consistently when run locally. Probably another bug unless it's the undefinedness mentioned by Lucas Werkmeister at phab:T196798#4269194.
BTW shall we create a tracking bug for all federation issues?
--- Jura 04:55, 15 June 2018 (UTC)Reply[reply]

Federation over SSL trows error[edit]

Thank you very much for accepting the endpoint of the City of Zurich https://ld.stadt-zuerich.ch/query for query federation. Unfortunately it does currently not work: we receive a protocol_version error when executing the following query:

SELECT *
WHERE 
{
  SERVICE <https://ld.stadt-zuerich.ch/query> {
    SELECT * WHERE {
      ?Kennzahl a <http://purl.org/linked-data/cube#AttributeProperty> ;
      <http://www.w3.org/2000/01/rdf-schema#label> ?KennzahlLabel .
    } 
  }
}
Run it!

Our technician hat a look at the Java stacktrace. One of many possible reasons could be that WQS is not supporting TLSv1.2. A good writeup including instructions for diagnosis and tweaking is in https://blogs.oracle.com/java-platform-group/diagnosing-tls,-ssl,-and-https. He tested connecting to https://ld.stadt-zuerich.ch/query with two JDKs that I have at hand, opendjdk-7 and openjdk-8. Connecting works for both versions out-of-the-box. One way I figured out to make it fail, is to artificially restrict the protocol versions, by using "-Dhttps.protocols=TLSv1,TLSv1.1" parameter for the client. Could you comment on this? Is there an SSL restriction on the wikidata side, or can we do something to fix this issue? --GrandGrue (talk) 14:39, 23 August 2018 (UTC)Reply[reply]

@Smalyshev (WMF): would you have any hint regarding this issue? --- Cristina Sarasua (talk) 14:56, 23 August 2018 (UTC)Reply[reply]

Meanwhile, we lowered the minimal TLS version on our side from 1.2 to 1.1, but that didn't resolve the issue. Are there explicit requirements from Wikidata on the TLS that federated endpoints have to fullfill (version, cyphers, ..)? --Mchlrch (talk) 15:01, 23 August 2018 (UTC)Reply[reply]
This is something coming from Java SSL engine, will need to dig into it and see what's going on. Smalyshev (WMF) (talk) 23:15, 24 August 2018 (UTC)Reply[reply]
@GrandGrue: Please see details of the issue at https://phabricator.wikimedia.org/T202785 - looks like the server is using HTTP/2 which is not currently supported well by Java client we're using (Jetty). Smalyshev (WMF) (talk) 00:29, 28 August 2018 (UTC)Reply[reply]

Federated Endpoints: date of addition[edit]

Is the information about the date when the federated endpoints were added to the Wikidata ecosystem published or stored somewhere? Cristina Sarasua (talk) 16:58, 22 October 2019 (UTC)Reply[reply]

@Criscod: https://github.com/wikimedia/wikidata-query-deploy/commits/master/whitelist.txt Multichill (talk) 19:24, 31 March 2020 (UTC)Reply[reply]
This is really useful, thanks! Cristina Sarasua (talk) 07:58, 1 April 2020 (UTC)Reply[reply]

Updated URL for the WikiPathways SPARQL endpoint[edit]

Due to the recent cyber attack at Maastricht University, the WikiPathways SPARQL endpoint (while not affected by the attack) has been decoupled from the internet. We have set up a different service, but it runs at http://sparql.wikipathways.org/sparql instead of http://sparql.wikipathways.org/ Can I request an update of the WDQS entry for this SPARQL endpoint, please? --Egon Willighagen (talk) 15:26, 15 January 2020 (UTC)Reply[reply]

@Egon Willighagen: it needs to be updated in https://github.com/wikimedia/wikidata-query-deploy/blob/master/whitelist.txt . Created a task in Phabricator for you. Multichill (talk) 19:29, 31 March 2020 (UTC)Reply[reply]
Awesome pointer! I did not know of that list. I'm on it! --Egon Willighagen (talk) 19:37, 31 March 2020 (UTC)Reply[reply]

License requirements for nomination[edit]

Hi, I am currently checking if Finto's SPARQL endpoint ( http://api.finto.fi/sparql ) could be nominated as allowed endpoints for query.wikidata.org. However, the datasets are under multiple licenses

I think that getting visible license information for unknowns is just a matter of notifying Finto that they are missing so the missing information is not a problem at this point.

However, are the CC-BY-SA, ODBL or unitsofmeasure.org's licenses blockers for nominations(?) If I understand correctly then we have already enabled MWAPI which shares data under CC-BY-SA and Sophox which is under ODbl so those aren't a problem anymore? How about the unitsofmeasure.org? --Zache (talk) 07:22, 2 October 2020 (UTC)Reply[reply]

marked as historic[edit]

Since Stas left nobody from the WMF is monitoring this page anymore. Multichill (talk) 19:22, 8 October 2020 (UTC)Reply[reply]

@Multichill: Thanks for the information. Do you know what is the correct way to propose new items to the allowed federation endpoints? Ie, should I create a phab-ticket or ask from project chat what to do? --Zache (talk) 11:36, 14 October 2020 (UTC)Reply[reply]
We are looking at reviewing that process, including better defining who should be responsible for approving additional federation endpoints (this feels more like this should be a community decision, Search Platform should not be the gatekeeper of Federation, except if it has performance or stability impacts on WDQS). We also need to have a better understanding of the cost (performance / stability) of federation. Phab task has been opened to track this. --GLederrey (WMF) (talk) 11:46, 15 October 2020 (UTC)Reply[reply]
I created a new discission topic to project chat: Updating SPARQL federation input review process --Zache (talk) 09:35, 25 October 2020 (UTC)Reply[reply]

@Zache, GLederrey (WMF), Multichill, Nikola Tulechki:

  • So what's the outcome of the review? There was a new request on this page in the meantime.
Shall we "unmark" it as historic?
An alternative could be to use Wikidata:Contact_the_development_team/Query_Service_and_search (the feedback page of the team adding and removing federation).
I'm mostly interested in this for the proposal made on "Query_Service_and_search" this week. --- Jura 12:30, 25 March 2021 (UTC)Reply[reply]
Currently this page is a dead end where proposals come to silently be forgotten. As long as this page isn't monitored it will stay that way regardless of what template is at the top. Not that people bother to read it anyway like User:YULdigitalpreservation proved. Multichill (talk) 18:14, 25 March 2021 (UTC)Reply[reply]
@DanBri, Removena, Olaf Simons: this page is historic and not monitored at all. Adding something here has no effect at all. I will fully protect the page to prevent more people from making the same mistake. Multichill (talk) 16:49, 31 May 2022 (UTC)Reply[reply]
Okay! Do you know where is the correct place to get in line for federation? Removena (talk) 08:52, 2 June 2022 (UTC)Reply[reply]
Please contact user:Sannita (WMF) for help. He is the person at the Wikimedia Foundation doing community relations for the the Wikidata Query Service. Multichill (talk) 15:59, 2 June 2022 (UTC)Reply[reply]
Just to be sure: I am merely helping the Search team in moving away from Blazegraph towards the new engine behind WDQS (which is still to be defined). Probably the question about federation should be asked to Wikimedia Deutschland. Sannita (WMF) (talk) 16:08, 2 June 2022 (UTC)Reply[reply]
@Sannita (WMF): thanks for your quick reply, but you completely missed the point. Blazegraph (or whatever replacement it might get) is the responsibility of the search team. Wikimedia Deutschland is not going to be of any use here. 20:21, 2 June 2022 (UTC)Reply[reply]