Wikidata:SPARQL federation input

From Wikidata
Jump to: navigation, search

One of the cool features of SPARQL is federation. It allows you to query several SPARQL endpoints together to get a combined query result. In order to enable better integration of data available in Wikidata with other linked data sources, we plan to enable SPARQL Federation on Wikidata Query Service to a selected number of other SPARQL endpoints. For security and performance reasons, we can not just allow any endpoint without filtering. We need to have a whitelist of approved endpoints. This page is for nominating and discussing which endpoints should be supported. Currently supported endpoints are listed in the User Manual.

The suggested SPARQL endpoints must satisfy the following conditions:

  • Complies with the SPARQL 1.1 protocol, "query operation" part, at least to the extent necessary to make federated SERVICE clause work (most SPARQL endpoints do).
  • Contains data that can be linked to Wikidata - i.e., either contains Wikidata IDs or can be queried by values contained in one of the Wikidata properties.
  • Has data freely available under license compatible with CC0 (preferred) or other free database license allowing unrestricted reuse. Attribution licenses like CC-BY are ok too. Currently, we do not accept endpoints with reuse restriction clauses like NC/ND.

Please post the URL of the endpoint, short description of it, and, if available, URL of documentation about it. Thank you for helping improve Wikidata.

Implemented endpoints and discussion of endpoints that have been rejected can be found in the Archive.

Nominate new endpoint

Suggestions[edit]

Licence suitable[edit]

Endpoints that are immediately suitable for inclusion.

Attribution licences (like CC-BY)[edit]

Looks like attribution license are OK too, we will acknowledge them on licensing page for the service. If for some reason such acknowledgment is not enough, please do not add the endpoint here.

See also:

Licence tbc[edit]

Unclear license status, please help us to figure it out.

LOD Cloud Cache[edit]

Endpoint
http://lod.openlinksw.com/sparql
Documentation
Licence
Background
https://lists.w3.org/Archives/Public/public-lod/2013May/0154.html
https://sourceforge.net/p/virtuoso/mailman/message/32005015/

FactForge[edit]

SPARQL Endpoint
http://factforge.net/sparql
Federation endpoint
http://factforge.net/repositories/ff-news
Documentation
http://factforge.net/about
Background
FactForge represents a large scale public demonstrator of many of GraphDB‘s advanced features: reasoning, geo-spatial indexing, RDFRank, full-text search connectors and owl:sameAs optimization. It loads several LOD datasets in a single GraphDB repository. On top of that, cleanup and other corrections are applied to some of these datasets and ontologies.

3cixty[edit]

Endpoint
https://kb.3cixty.com/sparql
Documentation
http://www.eurecom.fr/~troncy/Publications/Rizzo_Troncy-iswc15swc.pdf
Licence
Background
https://www.3cixty.com

3cixty provides comprehensive knowledge bases covering entire territories and cities. It contains millions of triples describing all point of interests, local businesses and events happening in the city. The Knowledge Base is updated every night. The SPARQL endpoint has 99% availability since 2 years.

--Rtroncy (talk) 20:02, 23 February 2017 (UTC)

@Rtroncy: any idea about the licensing terms? --Smalyshev (WMF) (talk) 19:58, 11 April 2017 (UTC)
@Smalyshev: Sorry for the late reply, strangely, I didn't get any notifications! I control the endpoint. What license would be suitable for you? Rtroncy (talk) 07:35, 6 June 2017 (UTC)
@Rtroncy: CC0 ideally, but we agreed that CC-BY would be fine too, if you're ok with acklowledgement like here: https://query.wikidata.org/copyright.html --Smalyshev (WMF) (talk) 22:03, 6 June 2017 (UTC)

Not suitable[edit]

Endpoint suggestions rejected for license or other reasons. May be reconsidered if license or circumstances change.

UniProt[edit]

Endpoint
http://sparql.uniprot.org/sparql
Documentation
http://sparql.uniprot.org
Licence
CC by-nd 3.0 for the copyright-able parts
Background
"The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data."

Suggested on Twitter. Unfortunately, licence is "ND" :-( Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:29, 23 February 2017 (UTC)

Also suggested on the wikidata mailing list on 22 of april 2016

The UniProt databases identifiers are used inside wikidata already via property P352. And a subset of UniProt information has been donated to wikidata by the UniProt consortium via the GeneWiki process. This SPARQL service is running in multiple datacentres and distributed machines and is a production grade service with significant existing users. The data is updated monthly in sync with the other uniprot services. Help and user support is available via [help@uniprot.org]. Funding by the NIH and SERI means that this data and service is meant to be used and won't go away if its too popular. There is a public discussion of usage and setup in this presentation Jerven (talk)

Unfortunately, the "provided you give us credit" part makes it very hard to use this for federation. The code has no way to do the providing credit part, given that the query is supplied by the user and the SPARQL results formats have no provision for providing any credits. That's btw why I don't like attribution clauses on data licenses. It sounds good until you start considering how to use this data in a federated context and it's next to impossible with such license. --Smalyshev (WMF) (talk) 21:09, 24 February 2017 (UTC)
The license applies to the copyrightable parts only. That said, using our IRI's is sufficient. If the result page has the original query it is also giving sufficient credit as it will show the source of the data via the service clause. Also this copyright we are talking about so the copy is being done by the user (of wikidata and uniprot both) and it is up the user to obey any laws as is required not the SPARQL endpoints. i.e. the person is fined not the copy machine, the two sparql endpoints are merely the tools. Also UniProt is not pure data, i.e. it does contain creative and original texts not just data--Jerven (talk) 08:13, 27 February 2017 (UTC)
Standard SPARQL result format does not include the original query. Of course, if the query would be done in WDQS GUI, that'd show the original query, but the query can also be done via REST API. And the data fetched by SPARQL query may involve not only IRIs but any other data accessible by SPARQL, including various transformations of the data. Not sure what the rules on complying with attribution and ND are in this case. Of course, the users are ultimately responsible for the queries they run, but since they use Wikimedia infrastructure and results can be published on Wikimedia sites, it may be source of all kinds of complications. Maybe need to get some legal advice on that. --Smalyshev (WMF) (talk) 00:25, 28 February 2017 (UTC)
"No derivatives" is very bizarre on a database, if one considers derivatives to include extracts. Presumably extracts are permitted, but only if they are unchanged, and no publication is made of extracts mashed up with any other data -- or altered, or falsified, etc.: I think that is the main reason why non-profit orgs put ND on materials.
I would think one can make such combinations for one's own use, probably including the use of tools like WDQS under one's own direction, provided that the results are not then shared or published. Jheald (talk) 18:22, 28 February 2017 (UTC)

British Museum[edit]

Endpoint
http://collection.britishmuseum.org/sparql
Documentation
http://collection.britishmuseum.org/help.html
Licence
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). That's a shame....
Background
http://collection.britishmuseum.org/ . Including this one as not suitable because of the license. Relevant properties: British Museum person-institution (P1711), British Museum place ID (P3633) & British Museum thesaurus ID (P3632). Multichill (talk) 12:34, 1 April 2017 (UTC)
@Multichill: NC pare does not seem to be a problem for us. May be a problem for somebody downstream, but I'm not sure whether we should be worried about it. If we allow CC-BY now, I think it should be ok? --Smalyshev (WMF) (talk) 19:59, 11 April 2017 (UTC)

External lists[edit]

Other discussion[edit]

Please discuss on the talk page.

Incoming nominations[edit]

The nominations are initially placed here and then sorted and moved into the specific topics above.

OBO/OntoBee[edit]

SPARQL Endpoint

http://sparql.hegroup.org/sparql/

Documentation

SPARQL Endpoint serving all OWL ontologies in the OBO Library (http://obofoundry.org)

Licence

The endpoint comprises multiple ontologies with heterogeneous licenses. See https://github.com/OBOFoundry/OBOFoundry.github.io/issues/299 for discussion.

Each ontology is stored in a separate named graph. Each such named graph should have a triple with dc:license or dc:rights indicating the license for that graph.

Background

This will provide a powerful way of querying data integrated with wikidata. Some OBOs are already well-integrated into WD (e.g DO), others are not yet. Cmungall (talk) 17:57, 13 July 2017 (UTC)