Wikidata:SPARQL教程

From Wikidata
Jump to navigation Jump to search
This page is a translated version of the page Wikidata:SPARQL tutorial and the translation is 38% complete.
Outdated translations are marked like this.

维基数据查询服务(WDQS)是洞察维基数据内容的一个强大工具。本指南将教您如何使用它。另可尝试维基媒体以色列提供的交互式教程

自行编写SPARQL查询前,查阅{{Item documentation}}和其他通用SPARQL查询模板,看看所需的查询是否已有人提供。

开始之前

虽然这本指南看起来非常长且令人生畏,但请壮起胆来。了解SPARQL的基础知识就能让你走得更远——即便你在读完#迈向第一个查询后就停止阅读,你也能了解到许多有趣的查询。本教程的每个章节都能使你有能力写出更多强大的查询。

如果你以前从未听说过维基数据、SPARQL或WDQS,下方有关于这些术语的简短解释。

  • 维基数据是一个知识数据库。它包含了数以百万计的陈述,如“加拿大的首都是渥太华”,或“蒙娜丽莎是用油彩画在杨木上的”,或“黄金的熔点是1064.18摄氏度”。
  • SPARQL是一种为知识数据库制定问题(查询)的语言。有了合适的数据库,SPARQL查询可以回答诸如“音乐中最流行的音调是什么?”或“哪个角色被最多的演员所描绘?”或“血型的分布是什么?”或“今年哪些作家的作品进入了公共领域?”这样的问题。
  • WDQS,即维基数据查询服务,将这两者结合起来。你输入一个SPARQL查询,它针对维基数据的数据集运行,并向你显示结果。

SPARQL基础

一个简单的SPARQL查询看起来像这样:

SELECT ?a ?b ?c
WHERE
{
  x y ?a.
  m n ?b.
  ?b f ?c.
}

SELECT从句列出您希望返回的变量(变量以问号开头),WHERE从句包含相关限制,主要以三元组的形式。维基数据(以及类似的知识数据库)中的所有信息都是以三元组的形式存储;当你运行查询时,查询服务会尝试将实际的值填入变量,从而呈现知识数据库中的三元组,并在返回的一个结果中呈现找到的每个变量组合。

三元组可以像一个句子一样阅读(这就是为什么它以句号结束),有一个主语、一个谓语和一个宾语

SELECT ?水果
WHERE
{
  ?水果 它的颜色 黄色.
  ?水果 味道 .
}

这个查询的结果可能包括如“柠檬”。在维基数据中,大多数属性都是“具有”类型的属性,因此查询通常为:

SELECT ?水果
WHERE
{
  ?水果 颜色 黄色.
  ?水果 味道 .
}

which reads like “?fruit has color ‘yellow’” (not?fruit is the color of ‘yellow’” – keep this in mind for property pairs like “parent”/“child”!).

但其实这不是WDQS的一个好例子。味道是主观的,所以维基数据没有为它设置属性。现在,让我们考虑一下“父与子”的关系,这种关系通常不存在歧义。

迈向第一个查询

假设我们想列出巴洛克作曲家Johann Sebastian Bach的所有孩子。像上面的查询一样使用“伪元素”,你会怎么写这个查询?

你或许会写成这样:

SELECT ?孩子
WHERE
{
  #  孩子 "他的父母" Bach
  ?孩子 父母 Bach.
  # (注:'#'后面的所有内容都是注释,WDQS会忽略。)
}

或者这样,

SELECT ?孩子
WHERE
{
  # 孩子(变量) "他的父亲" Bach 
  ?孩子 父亲 Bach. 
}

或者这样,

SELECT ?孩子
WHERE
{
  #  Bach "的孩子" 孩子(变量)
  Bach 孩子 ?孩子.
}

前两个三元组要求?孩子必须有父母/父亲“巴赫”;第三个要求“巴赫”必须有孩子?child。我们现在先用第二个。

那么,如何将其变为一个合适的WDQS查询呢?在维基数据中,项目和属性不是由人类可读的名称来识别的,如“父亲”(属性)或 “巴赫”(项目)。有充分的理由:“约翰·塞巴斯蒂安·巴赫”也是一位德国画家的名字,而“巴赫”也可能指的是姓氏法国公社水星坑等等。要找到一个项目的标识符,我们要搜索该项目,并复制结果中像是我们要找的项目的Q号(例如根据描述判断)。找到一个属性的标识符的方法相同,但要搜索“P:关键词”而不是仅仅搜索“关键词”,这样就能搜索范围限制在属性上。如此做能告诉我们,著名作曲家约翰·塞巴斯蒂安·巴赫对应的是Q1339,而指定一个项目的父亲的属性是P:P22

最后,还有一点很重要,我们要加上前缀。对简单的WDQS三元组来说,项的前缀是wd:,属性的前缀是wdt:。以上仅适用于固定的值,变量不需要前缀。

综上所述,我们得到了首个正确的WDQS查询:

SELECT ?孩子
WHERE
{
# ?孩子 父亲 Bach.
  ?孩子 wdt:P22 wd:Q1339.
}
Try it!

单击“试一试”链接,然后在WDQS页面上“运行”查询。你得到想要的结果了吗?

孩子
wd:Q57225
wd:Q76428

也许令你失望了?你只能看到标识符,虽然你可以单击来查看其对应的维基数据页面(包含易读的文字标签),有更好的方法显示这些结果吗?

别急,我们有,只需包含下列神奇的语句

SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }

WHERE从句,你就能拿到额外的标签。对您的查询中的每个?abc变量来说,你都有一个额外的?abcLabel,其中包含?abc变量所对应的项的文字标签(label)。如果将这个加到SELECT从句,你就能得到所需的项以及项的标签:

SELECT ?child ?childLabel
WHERE
{
# ?孩子  父亲   Bach
  ?child wdt:P22 wd:Q1339.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

运行试试 - 你应该不仅看到项目的编号,还能看到各个项目的名称。

child childLabel
wd:Q57225 Johann Christoph Friedrich Bach
wd:Q76428 Carl Philipp Emanuel Bach

自动完成

SERVICE 这段句子看上去很难记吧?编写查询时不停的搜索翻找也很烦人。还好,WDQS提供了“自动完成”解决方案。在query.wikidata.org查询编辑器中,在查询的任何位置按下Ctrl+Space(空格键)(或Alt+Enter(回车键)Ctrl+Alt+Enter)可以获取或许合适的代码建议;用方向键上和下选择合适的建议,然后按下Enter(回车键)来选择。

例如,相比每次都输入完整的SERVICE wikibase:label { bd:serviceParam wikibase:language "en". },您只需输入SERV,按Ctrl+Space,首个建议就是完整的这个句子,随取随用!按下Enter确认使用。(格式可能稍有变化,但无关紧要。)

自动完成还可帮您搜索。如果输入一个维基数据前缀,例如wd:wdt:,在后面输入一些内容,按Ctrl+Space将在维基数据上搜索该内容并给出建议的结果。wd:搜索项目(item),wdt:搜索属性。例如,相比找到Johann Sebastian Bach (Q1339)father (P22),只需输入wd:Bachwdt:fath,然后从自动完成中选择正确的结果。这也支持有空格的文本,例如wd:Johann Sebastian Bach

高级三元模式

现在我们看到了Johann Sebastian Bach的所有孩子,更准确的说——所有“父亲”为Johann Sebastian Bach的项目。但Bach有两任妻子,所以这些项包含不同的生母,如果我们只想看第一任妻子, Maria Barbara Bach(Q57487)所生的孩子呢?尝试基于上方的查询编写这个查询。

写出来了吗?让我们来看解决方案。完成此操作的最简单方法是添加第二个三元组作为限制条件:

SELECT ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339.
  ?child wdt:P25 wd:Q57487.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

它可以被理解为:

孩子(Child) 父亲 Johann Sebastian Bach.

孩子(Child) 母亲 Maria Barbara Bach.

听起来有点尴尬?在自然语言中我们会写为:

孩子的父亲是Johann Sebastian Bach,母亲是Maria Barbara Bach。

而事实上,SPARQL中也可以表达为类似的缩写形式:如果用英文的分号(;)而非句号结尾,就可以添加第二组谓词-对象。如此一来,上述查询可以缩写为:

SELECT ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339;
         wdt:P25 wd:Q57487.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

如此能得到相同的结果,并减少在查询中的复述。

现在假设,我们只对结果中是作曲家且是钢琴家的孩子感兴趣。对应项目是occupation (P106)composer (Q36834)pianist (Q486748)。尝试更新上方的查询来添加这些限制。

我的方案是这样:

SELECT ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339;
         wdt:P25 wd:Q57487;
         wdt:P106 wd:Q36834;
         wdt:P106 wd:Q486748.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

这里使用;缩写法超过两次,追加了两个必要条件。但正如你所见到的,仍有一些复述的地方,就好像在说:

孩子的职业是作曲家、职业是钢琴家。

而我们通常会说:

孩子的职业是作曲家和钢琴家。

巧的是,SPARQL对此也有语法应对:就像;允许你将谓词-对象追加到三元组(重用主语),英文逗号,则允许你再追加一个“对象”到三元组(重用主语和谓词)。因此,查询可以缩写为:

SELECT ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339;
         wdt:P25 wd:Q57487;
         wdt:P106 wd:Q36834,
                  wd:Q486748.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

注:缩进和其他空格其实不重要,只是为了可读性。因此还可以缩写为:

SELECT ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339;
         wdt:P25 wd:Q57487;
         wdt:P106 wd:Q36834, wd:Q486748.
  # 两个职业(匹配)放在同一行
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

或者更低可读性的写法:

SELECT ?child ?childLabel
WHERE
{
  ?child wdt:P22 wd:Q1339;
  wdt:P25 wd:Q57487;
  wdt:P106 wd:Q36834,
  wd:Q486748.
  # 不留缩进,会不容易区分,和;
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

幸运的是,WDQS编辑器能为您自动缩进,因此通常无需担心。

我们总结一下。每行查询的结构类似一句话。每个主题的三元组用一个英文句号终结。同一个主题多个谓词(第二项)用英文分号分隔,同一个主题和谓词的多个对象(第三项)用英文逗号分隔。

SELECT ?s1 ?s2 ?s3
WHERE
{
  ?s1 p1 o1;
      p2 o2;
      p3 o31, o32, o33.
  ?s2 p4 o41, o42.
  ?s3 p5 o5;
      p6 o6.
}

现在我想再介绍一个SPARQL提供的缩写。让我来再假设一个场景。

假设我们对Bach的孩子不感兴趣,但想了解他的“孙子”辈。注意,“孙子”可能因父亲或母亲而与Bach有关系,而这是两个不同的属性,这让事情变得复杂。让我们思维逆转,维基数据还有一个孩子(“child”)属性P:P40,是从“父母”项指向“子女”项,并且无关性别。那么,你能写一个返回Bach的孙子孙女的查询吗?

我的方案是这样:

SELECT ?grandChild ?grandChildLabel
WHERE
{
  wd:Q1339 wdt:P40 ?child.
  ?child wdt:P40 ?grandChild.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

在自然语言中对应:

Bach有一个孩子(?child),这个

孩子(?child)也有一个孩子(?grandChild)。

我们再来试试缩写这个查询。我们实际不关心中间的那个Bach的“孩子”,而只需要获取Bach的孙子/孙女。因此,我们不使用中间变量,可以将查询缩写为:

Bach的孩子有一个孩子(?grandChild)。

Instead of saying who Bach’s child is, we just say “someone”: we don’t care who it is. But we can refer back to them because we’ve said “someone who”: this starts a relative clause, and within that relative clause we can say things about “someone” (e.g., that they “have a child ?grandChild”). In a way, “someone” is a variable, but a special one that’s only valid within this relative clause, and one that we don’t explicitly refer to (we say “someone who is this and does that”, not “someone who is this and someone who does that” – that’s two different “someone”s).

在SPARQL中这可以写为:

SELECT ?grandChild ?grandChildLabel
WHERE
{
  wd:Q1339 wdt:P40 [ wdt:P40 ?grandChild ].
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

你可以用一对方括号([])代替变量,形成匿名变量。在方括号内可以指定“谓词-对象”组(亦即匹配条件),形式类似一般三元组后面的;;这种情况下,隐式的“主语”是括号所表现的匿名变量。另外,与放在;后面一样,你可以添加多组“谓词-对象”,或者用逗号添加同一谓词的多个对象(亦即匹配值)。

以上就是三元组模式。SPARQL还有很多内容,但因为我们即将离开与自然语言非常相似的部分,我想再次总结一下这种对应关系:

自然语言 例子 SPARQL 例子
句子 Juliet Romeo 句号 juliet romeo.
连词(从句) Romeo Juliet 并且 杀死了 他自己 分号 romeo juliet; 杀死了 romeo.
连词(名词) Romeo 杀死了 Tybalt 他自己 逗号 romeo 杀死了 tybalt, romeo.
相对从句 Juliet 杀死了 Tybalt 的人 方括号 juliet [ 杀死了 tybalt ].

实例和类别

不久前我说,维基数据的大多数属性“有”的关系,“有”孩子、“有”父亲、“有”职业。但有时(且其实是经常),你需要说的是“是”。但这实际涉及到两种关系。

  • 亂世佳人 一部电影。
  • 一部电影 一件艺术作品。

Gone with the Wind is one particular film. It has a particular director (Victor Fleming), a specific duration (238 minutes), a list of cast members (Clark Gable, Vivien Leigh, …), and so on.

Film is a general concept. Films can have directors, durations, and cast members, but the concept “film” as such does not have any particular director, duration, or cast members. And although a film is a work of art, and a work of art usually has a creator, the concept of “film” itself does not have a creator – only particular instances of this concept do.

这就是为什么维基数据中针对“是”(“is”)有两个属性,instance of (P31)subclass of (P279)。亂世佳人是“电影”类别中的一个特定实例,而类别“电影”是“艺术作品”中的一个更具体的专门子类。

To help you to figure about the difference, you can try to use two different verbs: "is a" and "is a kind of". If "is a kind of" works (e.g. A film "is a kind of" work of art), it indicates that you are talking about a subclass, a specialization of a broader class and you should use subclass of (P279). If "is a kind of" does not work (e.g. the sentence Gone with the wind "is a kind of" film does not make sense), it indicates that you are talking about a particular instance and you should use instance of (P31).

所以这对于编写SPARQL意味着什么?当我们想搜索“所有艺术作品”时,搜索所有直接隶属于“艺术作品”的实例项目是不够的。

SELECT ?work ?workLabel
WHERE
{
  ?work wdt:P31 wd:Q838948. # “艺术作品”的实例
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

我写这篇教程时(2016年10月),该查询只返回2,815个结果 - 显然,还有更多的艺术作品!这是因为它忽略了“乱世佳人”这种项目,它只是“电影”的一个实例,而不是“艺术作品”的实例。“电影”是“艺术作品”的一个子类,我们需要令SPARQL搜索时考虑这点。

一种可能的方案是之前提过的方括号[]语法:乱世佳人是“艺术作品”的某个子类别的实例。(为了试验,尝试这种查询)。但这仍存在问题:

  1. 这不再包含直接隶属于“艺术作品”的实例项目。
  2. 仍缺少一些项目,它们是“艺术作品”的多级子类下的实例——例如,《白雪公主与七个小矮人》是一部动画电影,这是一部电影,这是一件艺术作品。此时我们需要查询两级“子类”的语句——但也可能是三级、四级或更多。

解决方案:?item wdt:P31/wdt:P279* ?class。这表示“它”符合“隶属于”,并且在项目和类别之间有任意数量的“上级分类”语句。

SELECT ?work ?workLabel
WHERE
{
  ?work wdt:P31/wdt:P279* wd:Q838948. # 艺术作品的任何子类的实例
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

(我不建议运行该查询。WDQS可能勉强处理它,但您的浏览器可能在尝试显示结果时崩溃,因为结果非常多。)

现在你应该知道如何搜索所有艺术品、所有建筑物,或者所有人类定居点:魔咒 wdt:P31/wdt:P279* 以及相应类别。这使用了一些我还没有介绍的SPARQL功能,但这几乎是这些功能的唯一相关用途,因此,您“不需要”了解它的工作原理,也能有效地使用WDQS。如果你仍然想知道,我会在稍后介绍;你也可以跳过下一章节,并只在需要用到时复制粘贴 wdt:P31/wdt:P279*

属性路径

“属性路径”是一种非常简洁的记述两个项目(item)间的属性的路径的方法。最简单的路径只有一个属性,形成一个普通的三元组:

?item wdt:P31 ?class.

You can add path elements with a forward slash (/).

?item wdt:P31/wdt:P279/wdt:P279 ?class.

这等同于下列任一写法:

?item wdt:P31 ?temp1.
?temp1 wdt:P279 ?temp2.
?temp2 wdt:P279 ?class.
?item wdt:P31 [ wdt:P279 [ wdt:P279 ?class ] ].

练习:用此语法重写前面的“巴赫的孙子”查询。

路径后面的星号(*)意味着0个或许多个路径元素。

?item wdt:P31/wdt:P279* ?class.
# means:
?item wdt:P31 ?class
# or
?item wdt:P31/wdt:P279 ?class
# or
?item wdt:P31/wdt:P279/wdt:P279 ?class
# or
?item wdt:P31/wdt:P279/wdt:P279/wdt:P279 ?class
# or ...

In the special case where there is zero property in a path (no specific arc of relation: a NULL, "universal" property), then the subject node is directly connected to the object node in the graph, whatever the object node is, including itself. So that there is always a match. Thus, in SPARQL, for instance in the case "zero something", ?a something* ?b reduces to ?a ?b, with no path between them, and ?a takes directly the value of ?b.

A plus (+) is similar to an asterisk, but means “one or more of this element”. The following query finds all descendants of Bach:

SELECT ?descendant ?descendantLabel
WHERE
{
  wd:Q1339 wdt:P40+ ?descendant.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

If we used an asterisk instead of a plus here, the query results would include Bach himself.

A question mark (?) is similar to an asterisk or a plus, but means “zero or one of this element”.

You can separate path elements with a vertical bar (|) instead of a forward slash; this means “either-or”: the path might use either of those properties. (But not combined – an either-or path segment always matches a path of length one.)

You can also group path elements with parentheses (()), and freely combine all these syntax elements (/|*+?). This means that another way to find all descendants of Bach is:

SELECT ?descendant ?descendantLabel
WHERE
{
  ?descendant (wdt:P22|wdt:P25)+ wd:Q1339.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

Instead of using the “child” property to go from Bach to his descendants, we use the “father” and “mother” properties to go from the descendants to Bach. The path might include two mothers and one father, or four fathers, or father-mother-mother-father, or any other combination. (Though, of course, Bach can’t be the mother of someone, so the last element will always be father.)

限定符

(Good news first: this section introduces no additional SPARQL syntax – yay! Take a quick breath and relax, this should be a piece of cake. Right?)

So far, we’ve only talked about simple statements: subject, property, object. But Wikidata statements are more than that: they can also have qualifiers and references. For example, the Mona Lisa (Q12418) has three made from material (P186) statements:

  1. oil paint (Q296955), the main material;
  2. poplar wood (Q291034), with the qualifier applies to part (P518)painting support (Q861259) – this is the material that the Mona Lisa was painted on; and
  3. wood (Q287), with the qualifiers applies to part (P518)stretcher (Q1737943) and start time (P580) 1951 – this is a part that was added to the painting later.

Suppose we want to find all paintings with their painting surface, that is, those made from material (P186) statements with a qualifier applies to part (P518)painting support (Q861259). How do we do that? That’s more information than can be represented in a single triple.

The answer is: more triples! (Rule of thumb: Wikidata’s solution for almost everything is “more items”, and the corresponding WDQS rule is “more triples”. References, numeric precision, values with units, geocoordinates, etc., all of which we’re skipping here, also work like this.) So far, we’ve used the wdt: prefix for our statement triples, which points directly to the object of the statement. But there’s also another prefix: p:, which points not to the object, but to a statement node. This node then is the subject of other triples: the prefix ps: (for property statement) points to the statement object, the prefix pq: (property qualifier) to qualifiers, and prov:wasDerivedFrom points to reference nodes (which we’ll ignore for now).

That was a lot of abstract text. Here’s a concrete example for the Mona Lisa:

wd:Q12418 p:P186 ?statement1.    # Mona Lisa: material used: ?statement1
?statement1 ps:P186 wd:Q296955.  # value: oil paint

wd:Q12418 p:P186 ?statement2.    # Mona Lisa: material used: ?statement2
?statement2 ps:P186 wd:Q291034.  # value: poplar wood
?statement2 pq:P518 wd:Q861259.  # qualifier: applies to part: painting surface

wd:Q12418 p:P186 ?statement3.    # Mona Lisa: material used: ?statement3
?statement3 ps:P186 wd:Q287.     # value: wood
?statement3 pq:P518 wd:Q1737943. # qualifier: applies to part: stretcher bar
?statement3 pq:P580 1951.        # qualifier: start time: 1951 (pseudo-syntax)

We can abbreviate this a lot with the [] syntax, replacing the ?statement variables:

wd:Q12418 p:P186 [ ps:P186 wd:Q296955 ].

wd:Q12418 p:P186 [
            ps:P186 wd:Q291034;
            pq:P518 wd:Q861259
          ].

wd:Q12418 p:P186 [
            ps:P186 wd:Q287;
            pq:P518 wd:Q1737943;
            pq:P580 1951
          ].

Can you use this knowledge to write a query for all paintings with their painting surface?

Here’s my solution:

SELECT ?painting ?paintingLabel ?material ?materialLabel
WHERE
{
  ?painting wdt:P31/wdt:P279* wd:Q3305213;
            p:P186 [ ps:P186 ?material; pq:P518 wd:Q861259 ].
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

First, we limit ?painting to all instances of painting (Q3305213) or a subclass thereof. Then, we extract the material from the p:P186 statement node, limiting the statements to those that have an applies to part (P518)painting support (Q861259) qualifier.

顺序(ORDER)和限制(LIMIT

We return to our regular scheduled program of more SPARQL features.

So far, we’ve only had queries where we were interested in all results. But it’s quite common to care only about a few results: those that are most extreme in some way – oldest, youngest, earliest, latest, highest population, lowest melting point, most children, most materials used, and so on. The common factor here is that the results are ranked in some way, and then we care about the first few results (those with the best rank).

This is controlled by two clauses, which are appended to the WHERE {} block (after the braces, not inside!): ORDER BY and LIMIT.

ORDER BY something sorts the results by something. something can be any expression – for now, the only kind of expression we know are simple variables (?something), but we’ll see some other kinds later. This expression can also be wrapped in either ASC() or DESC() to specify the sorting order (ascending or descending). (If you don’t specify either, the default is ascending sort, so ASC(something) is equivalent to just something.)

LIMIT count cuts off the result list at count results, where count is any natural number. For example, LIMIT 10 limits the query to ten results. LIMIT 1 only returns a single result.

(You can also use LIMIT without ORDER BY. In this case, the results aren’t sorted, so you don’t have any guarantee which results you’ll get. Which is fine if you happen to know that there’s only a certain number of results, or you’re just interested in some result, but don’t care about which one. In either case, adding the LIMIT can significantly speed up the query, since WDQS can stop searching for results as soon as it’s found enough to fill the limit.)

Exercise time! Try to write a query that returns the ten most populous countries. (A country is a sovereign state (Q3624078), and the property for population is P:P1082.) You can start by searching for countries with their population, and then add the ORDER BY and LIMIT clauses.

Here’s my solution:

SELECT ?country ?countryLabel ?population
WHERE
{
  ?country wdt:P31/wdt:P279* wd:Q3624078;
           wdt:P1082 ?population.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?population)
LIMIT 10
Try it!

Note that if we want the most populous countries, we have to order by descending population, so that the first results will be the ones with the highest values.

练习

截至目前,我们了解了很多领域,是时候做一些练习了——如果你赶时间,可以跳过此章节。

Conan Doyle写的书

Write a query that returns all books by Sir Arthur Conan Doyle.

化学元素

Write a query that returns all chemical elements with their element symbol and atomic number, in order of their atomic number.

流入密西西比河的河流

Write a query that returns all rivers that flow directly into the Mississippi River. (The main challenge is finding the correct property…)

流入密西西比河的河流(第二课)

Write a query that returns all rivers that flow into the Mississippi River, directly or indirectly.

可选(OPTIONAL

In the exercises above, we had a query for all books by Sir Arthur Conan Doyle:

SELECT ?book ?bookLabel
WHERE
{
  ?book wdt:P50 wd:Q35610.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

But that’s a bit boring. There’s so much potential data about books, and we only show the label? Let’s try to craft a query that also includes the title (P1476), illustrator (P110), publisher (P123) and publication date (P577).

A first attempt might look like this:

SELECT <span lang="en" dir="ltr" class="mw-content-ltr">?book ?title ?illustratorLabel ?publisherLabel ?published</span>
WHERE
{
  ?book wdt:P50 wd:Q35610;
        wdt:P1476 ?title;
        wdt:P110 ?illustrator;
        wdt:P123 ?publisher;
        wdt:P577 ?published.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

Run that query. As I’m writing this, it only returns two results – a bit meager! Why is that? We found over a hundred books earlier!

The reason is that to match this query, a potential result (a book) must match all the triples we listed: it must have a title, and an illustrator, and a publisher, and a publication date. If it has some of those properties, but not all of them, it won’t match. And that’s not what we want in this case: we primarily want a list of all the books – if additional data is available, we’d like to include it, but we don’t want that to limit our list of results.

The solution is to tell WDQS that those triples are optional:

SELECT <span lang="en" dir="ltr" class="mw-content-ltr">?book ?title ?illustratorLabel ?publisherLabel ?published</span>
WHERE
{
  ?book wdt:P50 wd:Q35610.
  OPTIONAL { ?book wdt:P1476 ?title. }
  OPTIONAL { ?book wdt:P110 ?illustrator. }
  OPTIONAL { ?book wdt:P123 ?publisher. }
  OPTIONAL { ?book wdt:P577 ?published. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

This gives us the additional variables (?title, ?publisher etc.) if the appropriate statement exists, but if the statement doesn’t exist, the result isn’t discarded – the variable simply isn’t set.

Note: it’s very important to use separate OPTIONAL clauses here. If you put all the triples into a single clause, like here –

SELECT ?book ?title ?illustratorLabel ?publisherLabel ?published
WHERE
{
  ?book wdt:P50 wd:Q35610.
  OPTIONAL {
    ?book wdt:P1476 ?title;
          wdt:P110 ?illustrator;
          wdt:P123 ?publisher;
          wdt:P577 ?published.
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!

– you’ll notice that most of the results don’t include any extra information. This is because an optional clause with multiple triples only matches when all those triples can be satisfied. That is: if a book has a title, an illustrator, a publisher, and a publication date, then the optional clause matches, and those values are assigned to the appropriate variables. But if a book has, for example, a title but no illustrator, the entire optional clause doesn’t match, and although the result isn’t discarded, all four variables remain empty.

表达式、筛选器和绑定

This section might seem a bit less organized than the other ones, because it covers a fairly wide and diverse topic. The basic concept is that we would like to do something with the values that, so far, we’ve just selected and returned indiscriminately. And expressions are the way to express these operations on values. There are many kinds of expressions, and a lot of things you can do with them – but first, let’s start with the basics: data types.

数据类型

Each value in SPARQL has a type, which tells you what kind of value it is and what you can do with it. The most important types are:

  • item, like wd:Q42 for Douglas Adams (Q42).
  • boolean, with the two possible values true and false. Boolean values aren’t stored in statements, but many expressions return a boolean value, e.g. 2 < 3 (true) or "a" = "b" (false).
  • string, a piece of text. String literals are written in double quotes.
  • monolingual text, a string with a language tag attached. In a literal, you can add the language tag after the string with an @ sign, e.g. "Douglas Adams"@en.
  • numbers, either integers (1) or decimals (1.23).
  • dates. Date literals can be written by adding ^^xsd:dateTime (case sensitive – ^^xsd:datetime won’t work!) to an ISO 8601 date string: "2012-10-29"^^xsd:dateTime.

运算符

The familiar mathematical operators are available: +, -, *, / to add, subtract, multiply or divide numbers, <, >, =, <=, >= to compare them. The inequality test ≠ is written !=. Comparison is also defined for other types; for example, "abc" < "abd" is true (lexical comparison), as is "2016-01-01"^^xsd:dateTime > "2015-12-31"^^xsd:dateTime and wd:Q4653 != wd:Q283111. And boolean conditions can be combined with && (logical and: a && b is true if both a and b are true) and || (logical or: a || b is true if either (or both) of a and b is true).

筛选器

 信息 For a sometimes faster alternative to FILTER, you might also look at MINUS, see example.

FILTER(condition). is a clause you can insert into your SPARQL query to filter the results. Inside the parentheses, you can put any expression of boolean type, and only those results where the expression returns true are used.

For example, to get a list of all humans born in 2015, we first get all humans with their date of birth –

SELECT ?person ?personLabel ?dob
WHERE
{
  ?person wdt:P31 wd:Q5;
          wdt:P569 ?dob.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
}

– and then filter that to only return the results where the year of the date of birth is 2015. There are two ways to do that: extract the year of the date with the YEAR function, and test that it’s 2015 –

FILTER(YEAR(?dob) = 2015).

– or check that the date is between Jan. 1st (inclusive), 2015 and Jan. 1st, 2016 (exclusive):

FILTER("2015-01-01"^^xsd:dateTime <= ?dob && ?dob < "2016-01-01"^^xsd:dateTime).

I’d say that the first one is more straightforward, but it turns out the second one is much faster, so let’s use that:

SELECT ?person ?personLabel ?dob
WHERE
{
  ?person wdt:P31 wd:Q5;
          wdt:P569 ?dob.
  FILTER("2015-01-01"^^xsd:dateTime <= ?dob && ?dob < "2016-01-01"^^xsd:dateTime).
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } 
}
Try it!

Another possible use of FILTER is related to labels. The label service is very useful if you just want to display the label of a variable. But if you want to do stuff with the label – for example: check if it starts with “Mr. ” – you’ll find that it doesn’t work:

SELECT ?human ?humanLabel
WHERE
{
  ?human wdt:P31 wd:Q15632617.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  #This FILTER does not work!
  FILTER(STRSTARTS(?humanLabel, "Mr. ")).
}
Try it!

This query finds all instances of fictional human (Q15632617) and tests if their label starts with "Mr. " (STRSTARTS is short for “string starts [with]”; there’s also STRENDS and CONTAINS). The reason why this doesn’t work is that the label service adds its variables very late during query evaluation; at the point where we try to filter on ?humanLabel, the label service hasn’t created that variable yet.

Fortunately, the label service isn’t the only way to get an item’s label. Labels are also stored as regular triples, using the predicate rdfs:label. Of course, this means all labels, not just English ones; if we only want English labels, we’ll have to filter on the language of the label:

FILTER(LANG(?label) = "en").

The LANG function returns the language of a monolingual string, and here we only select those labels that are in English. The full query is:

SELECT ?human ?label
WHERE
{
  ?human wdt:P31 wd:Q15632617;
         rdfs:label ?label.
  FILTER(LANG(?label) = "[AUTO_LANGUAGE]").
  FILTER(STRSTARTS(?label, "Mr. ")).
}
Try it!

We get the label with the ?human rdfs:label ?label triple, restrict it to English labels, and then check if it starts with “Mr. ”.

One can also use FILTER with a regular expression. In the following example

SELECT ?item ?itemLabel ?bblid
WHERE {  
    ?item wdt:P2580 ?bblid .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }  
    FILTER(!REGEX(STR(?bblid), "[\\.q]")) 
}
Try it!

If the format constraint for an ID is [A-Za-z][-.0-9A-Za-z]{1,}:

SELECT ?item ?itemLabel ?bblid
WHERE {  
    ?item wdt:P2580 ?bblid .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }  
    FILTER(!REGEX(STR(?bblid), "^[A-Za-z][-.0-9A-Za-z]{1,}$"))
}
Try it!

It is possible to filter out specific elements like this

FILTER ( ?item not in ( wd:Q4115189,wd:Q13406268,wd:Q15397819 ) )

It is possible to filter and have elements that aren't filled:

FILTER ( NOT EXISTS { ?item  wdt:P21 [] } )


BINDBOUNDIF

These three features are often used in conjunction, so I’ll first explain all three of them and then show you some examples.

A BIND(expression AS ?variable). clause can be used to assign the result of an expression to a variable (usually a new variable, but you can also overwrite existing ones).

BOUND(?variable) tests if a variable has been bound to a value (returns true or false). It’s mostly useful on variables that are introduced in an OPTIONAL clause.

IF(condition,thenExpression,elseExpression) evaluates to thenExpression if condition evaluates to true, and to elseExpression if condition evaluates to false. That is, IF(true, "yes", "no") evaluates to "yes", and IF(false, "great", "terrible") evaluates to "terrible".

BIND can be used to bind the results of some calculation to a new variable. This can be an intermediate result of a larger calculation or just directly a result of the query. For example, to get the age of victims of capital punishment:

SELECT ?person ?personLabel ?age
WHERE
{
  ?person wdt:P31 wd:Q5;
          wdt:P569 ?born;
          wdt:P570 ?died;
          wdt:P1196 wd:Q8454.
  BIND(?died - ?born AS ?ageInDays).
  BIND(?ageInDays/365.2425 AS ?ageInYears).
  BIND(FLOOR(?ageInYears) AS ?age).
  # or, as one expression:
  #BIND(FLOOR((?died - ?born)/365.2425) AS ?age).
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

BIND can also be used to simply bind constant values to variables in order to increase readability. For example, a query that finds all female priests:

SELECT ?woman ?womanLabel
WHERE
{
  ?woman wdt:P31 wd:Q5;
         wdt:P21 wd:Q6581072;
         wdt:P106 wd:Q42603.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

can be rewritten like this:

SELECT ?woman ?womanLabel
WHERE
{
  BIND(wdt:P31 AS ?instanceOf).
  BIND(wd:Q5 AS ?human).
  BIND(wdt:P21 AS ?sexOrGender).
  BIND(wd:Q6581072 AS ?female).
  BIND(wdt:P106 AS ?occupation).
  BIND(wd:Q42603 AS ?priest).
  ?woman ?instanceOf ?human;
         ?sexOrGender ?female;
         ?occupation ?priest.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

The meaningful part of the query, from ?woman to ?priest., is now probably more readable. However, the large BIND block right in front of it is pretty distracting, so this technique should be used sparingly. (In the WDQS user interface, you can also hover your mouse over any term like wd:Q123 or wdt:P123 and see the label and description for the entity, so ?female is only more readable than wd:Q6581072 if you ignore that feature.)

IF expressions are often used with condition-expressions built with BOUND. For example, suppose you have a query that shows some humans, and instead of just showing their label, you’d like to display their pseudonym (P742) if they have one, and only use the label if a pseudonym doesn’t exist. For this, you select the pseudonym in an OPTIONAL clause (it has to be optional – you don’t want to throw out results that don’t have a pseudonym), and then use BIND(IF(BOUND(… to select either the pseudonym or the label.

SELECT ?writer ?label
WHERE
{
  # French writer born in the second half of the 18th century
  ?writer wdt:P31 wd:Q5;
          wdt:P27 wd:Q142;
          wdt:P106 wd:Q36180;
          wdt:P569 ?dob.
  FILTER("1751-01-01"^^xsd:dateTime <= ?dob && ?dob < "1801-01-01"^^xsd:dateTime).
  # get the English label
  ?writer rdfs:label ?writerLabel.
  FILTER(LANG(?writerLabel) = "en").
  # get the pseudonym, if it exists
  OPTIONAL { ?writer wdt:P742 ?pseudonym. }
  # bind the pseudonym, or if it doesn’t exist the English label, as ?label
  BIND(IF(BOUND(?pseudonym),?pseudonym,?writerLabel) AS ?label).
}
Try it!

Other properties that may be used in this way include nickname (P1449), posthumous name (P1786), and taxon common name (P1843) – anything where some sort of “fallback” makes sense.

You can also combine BOUND with FILTER to ensure that at least one of several OPTIONAL blocks has been fulfilled. For example, let’s get all astronauts that went to the moon, as well as the members of Apollo 13 (Q182252) (close enough, right?). That restriction can’t be expressed as a single property path, so we need one OPTIONAL clause for “member of some moon mission” and another one for “member of Apollo 13”. But we only want to select those results where at least one of those conditions is true.

SELECT ?astronaut ?astronautLabel
WHERE
{
  ?astronaut wdt:P31 wd:Q5;
             wdt:P106 wd:Q11631.
  OPTIONAL {
    ?astronaut wdt:P450 ?mission.
    ?mission wdt:P31 wd:Q495307.
  }
  OPTIONAL {
    ?astronaut wdt:P450 wd:Q182252.
    BIND(wd:Q182252 AS ?mission).
  }
  FILTER(BOUND(?mission)).
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
Try it!

组合(COALESCE

The COALESCE function can be used as an abbreviation of the BIND(IF(BOUND(?x), ?x, ?y) AS ?z). pattern for fallbacks mentioned above: it takes a number of expressions and returns the first one that evaluates without error. For example, the above “pseudonym” fallback

BIND(IF(BOUND(?pseudonym),?pseudonym,?writerLabel) AS ?label).

can be written more concisely as

BIND(COALESCE(?pseudonym, ?writerLabel) AS ?label).

and it’s also easy to add another fallback label in case the ?writerLabel isn’t defined either:

BIND(COALESCE(?pseudonym, ?writerLabel, "<no label>") AS ?label).

分组

So far, all the queries we’ve seen were queries that found all items satisfying some conditions; in some cases, we also included extra statements on the item (paintings with materials, Arthur Conan Doyle books with title and illustrator).

But it’s very common that we don’t want a long list of all results. Instead, we might ask questions like this:

  • How many paintings were painted on canvas / poplar wood / etc.?
  • What is the highest population of each country’s cities?
  • What is the total number of guns produced by each manufacturer?
  • Who publishes, on average, the longest books?

城市人口

Let’s look at the second question for now. It’s fairly simple to write a query that lists all cities along with their population and country, ordered by country:

SELECT ?country ?city ?population
WHERE
{
  ?city wdt:P31/wdt:P279* wd:Q515;
        wdt:P17 ?country;
        wdt:P1082 ?population.
}
ORDER BY ?country
Try it!

(Note: that query returns a lot of results, which might cause trouble for your browser. You might want to add a LIMIT clause.)

Since we’re ordering the results by country, all cities belonging to a country form one contiguous block in the results. To find the highest population within that block, we want to consider the block as a group, and aggregate all the individual population values into one value: the maximum. This is done with a GROUP BY clause below the WHERE block, and an aggregate function (MAX) in the SELECT clause.

SELECT ?country (MAX(?population) AS ?maxPopulation)
WHERE
{
  ?city wdt:P31/wdt:P279* wd:Q515;
        wdt:P17 ?country;
        wdt:P1082 ?population.
}
GROUP BY ?country
Try it!

We’ve replaced the ORDER BY with a GROUP BY. The effect of this is that all results with the same ?country are now grouped together into a single result. This means that we have to change the SELECT clause as well. If we kept the old clause SELECT ?country ?city ?population, which ?city and ?population would be returned? Remember, there are many results in this one result; they all have the same ?country, so we can select that, but since they can all have a different ?city and ?population, we have to tell WDQS which of those values to select. That’s the job of the aggregate function. In this case, we’ve used MAX: out of all the ?population values, we select the maximum one for the group result. (We also have to give that value a new name with the AS construct, but that’s just a minor detail.)

This is the general pattern for writing group queries: write a normal query that returns the data you want (not grouped, with many results per “group”), then add a GROUP BY clause and add an aggregate function to all the non-grouped variables in the SELECT clause.

绘画材料

Let’s try it out with another question: How many paintings were painted on each material? First, write a query that just returns all paintings along with their painting material. (Take care to only use those made from material (P186) statements with an applies to part (P518)painting support (Q861259) qualifier.)

SELECT ?material ?painting
WHERE
{
  ?painting wdt:P31/wdt:P279* wd:Q3305213;
            p:P186 [ ps:P186 ?material; pq:P518 wd:Q861259 ].
}
Try it!

Next, add a GROUP BY clause on the ?material, and then an aggregate function on the other selected variable (?painting). In this case, we are interested in the number of paintings; the aggregate function for that is COUNT.

SELECT ?material (COUNT(?painting) AS ?count)
WHERE
{
  ?painting wdt:P31/wdt:P279* wd:Q3305213;
            p:P186 [ ps:P186 ?material; pq:P518 wd:Q861259 ].
}
GROUP BY ?material
Try it!

One problem with this is that we don’t have the label for the materials, so the results are a bit inconvenient to interpret. If we just add the label variable, we’ll get an error:

SELECT ?material ?materialLabel (COUNT(?painting) AS ?count)
WHERE
{
  ?painting wdt:P31/wdt:P279* wd:Q3305213;
            p:P186 [ ps:P186 ?material; pq:P518 wd:Q861259 ].
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
GROUP BY ?material
Try it!

Bad aggregate

“Bad aggregate” is an error message you’ll probably see a lot when working with group queries; it means that one of the selected variables needs an aggregate function but doesn’t have one, or it has an aggregate function but isn’t supposed to have one. In this case, WDQS thinks that there might be multiple ?materialLabels per ?material (even though we know that can’t happen), and so it complains that you’re not specifying an aggregate function for that variable.

One solution is to group over multiple variables. If you list multiple variables in the GROUP BY clause, there’s one result for each combination of those variables, and you can select all those variables without aggregate function. In this case, we’ll group over both ?material and ?materialLabel.

SELECT ?material ?materialLabel (COUNT(?painting) AS ?count)
WHERE
{
  ?painting wdt:P31/wdt:P279* wd:Q3305213;
            p:P186 [ ps:P186 ?material; pq:P518 wd:Q861259 ].
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
GROUP BY ?material ?materialLabel
Try it!

We’re almost done with the query – just one more improvement: we’d like to see the most-used materials first. Fortunately, we’re allowed to use the new, aggregated variables from the SELECT clause (here, ?count) in an ORDER BY clause, so this is very simple:

SELECT ?material ?materialLabel (COUNT(?painting) AS ?count)
WHERE
{
  ?painting wdt:P31/wdt:P279* wd:Q3305213;
            p:P186 [ ps:P186 ?material; pq:P518 wd:Q861259 ].
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
GROUP BY ?material ?materialLabel
ORDER BY DESC(?count)
Try it!

As an exercise, let’s do the other queries too.

按制造商划分枪支

What is the total number of guns produced by each manufacturer?

按页数排序出版商

What is the average (function: AVG) number of pages of books by each publisher?

HAVING(必须)

A small addendum to that last query – if you look at the results, you might notice that the top result has an outrageously large average, over ten times that of the second place. A bit of investigation reveals that this is because that publisher (UTET (Q4002388)) only published a single book with a number of pages (P1104) statement, Grande dizionario della lingua italiana (Q3775610), which skews the results a bit. To remove outliers like that, we could try to select only publishers that published at least two books with number of pages (P1104) statements on Wikidata.

How do we do that? Normally, we restrict results with a FILTER clause, but in this case we want to restrict based on the group (the number of books), not any individual result. This is done with a HAVING clause, which can be placed right after a GROUP BY clause and takes an expression just like FILTER does:

SELECT ?publisher ?publisherLabel (AVG(?pages) AS ?avgPages)
WHERE
{
  ?book wdt:P123 ?publisher;
        wdt:P1104 ?pages.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
GROUP BY ?publisher ?publisherLabel
HAVING(COUNT(?book) > 1)
ORDER BY DESC(?avgPages)
Try it!

聚合函数简述

Here’s a short summary of the available aggregate functions:

  • COUNT: the number of elements. You can also write COUNT(*) to simply count all results.
  • SUM, AVG: the sum or average of all elements, respectively. If the elements aren’t numbers, you’ll get weird results.
  • MIN, MAX: the minimum or maximum value of all elements, respectively. This works for all value types; numbers are sorted numerically, strings and other types lexically.
  • SAMPLE: any element. This is occasionally useful if you know there’s only one result, or if you don’t care which one is returned.
  • GROUP_CONCAT: concatenates all elements. Useful for example if you want only one result for an item but you want to include informations for a property that may have several statements for this item, such as the occupations of a person. The different occupations may be regrouped and concatenated to appear all in only one variable instead of several lines in the results. If you’re curious, you can look it up in the SPARQL specification.

Additionally, you can add a DISTINCT modifier for any of these functions to eliminate duplicate results. For example, if there are two results but they both have the same value in ?var, then COUNT(?var) will return 2 but COUNT(DISTINCT ?var) will only return 1. You often have to use DISTINCT when your query can return the same item multiple times – this can happen if, for example, you use ?item wdt:P31/wdt:P279* ?class, and there are multiple paths from ?item to ?class: you will get a new result for each of those paths, even though all the values in the result are identical. (If you’re not grouping, you can also eliminate those duplicate results by starting the query with SELECT DISTINCT instead of just SELECT.)

wikibase:标签和聚合的bug

A query such as the following, which searches all academic persons with more than two countries of citizenships in Wikidata, does not show the names of those countries in the ?citizenships column:

select ?person ?personLabel (group_concat(?citizenshipLabel;separator="/") as ?citizenships) {
  # find all academics
  ?person wdt:P106 wd:Q3400985 ;   
          wdt:P27  ?citizenship .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} group by ?person ?personLabel having (count(?citizenship) > 2)
Try it!

To show the ?citizenships, explicitly name the ?personLabel and ?citizenshipLabel in the wikibase:label service call like this:

  SERVICE wikibase:label { 
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". 
    ?citizenship rdfs:label ?citizenshipLabel .
    ?person      rdfs:label ?personLabel .
  }

The following query works as expected:

select ?person ?personLabel (group_concat(?citizenshipLabel;separator="/") as ?citizenships) {
  ?person wdt:P106 wd:Q3400985 ;
          wdt:P27  ?citizenship .
  SERVICE wikibase:label { 
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". 
    ?citizenship rdfs:label ?citizenshipLabel .
    ?person      rdfs:label ?personLabel .
  }
} group by ?person ?personLabel having (count(?citizenship) > 2)
Try it!

VALUES

One can select items based on a list of items:

SELECT ?item ?itemLabel ?mother ?motherLabel WHERE {
  VALUES ?item { wd:Q937 wd:Q1339 }
  OPTIONAL { ?item wdt:P25 ?mother. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

One can also select based on a list of values of a specific property:

SELECT ?item ?itemLabel ?mother ?motherLabel ?ISNI WHERE {
  VALUES ?ISNI { "000000012281955X" "0000000122764157" }
  ?item wdt:P213 ?ISNI.
  OPTIONAL { ?item wdt:P25 ?mother. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

VALUES can also do more and build enumerations of values possible for a couple (or a tuple) of variables. For example say you want to use (known) custom labels for the persons enumerated in the first « value » example. It’s then possible to use a « values » clause such as VALUES (?item ?customItemLabel) { (wd:Q937 "Einstein") (wd:Q1339 "Bach") } which ensures that whenever ?item has value wd:Q937 in a result, ?customItemLabel own value is Einstein and whenever ?item has value wd:Q1339, ?customItemLabel’s value is Bach.

SELECT ?item ?customItemLabel ?mother ?motherLabel WHERE {
  VALUES (?item ?customItemLabel) { (wd:Q937 "Einstein") (wd:Q1339 "Bach") }
  OPTIONAL { ?item wdt:P25 ?mother. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

未完待续…

This guide ends here. SPARQL doesn’t: there’s still a lot that I haven’t shown you – I never promised this was going to be a complete guide! If you got this far, you already know a lot about WDQS and should be able to write some very powerful queries. But if you want to learn even more, here are some things you can look at:

  • Subqueries. You add another entire query in curly brackets ({ SELECT ... WHERE { ... } LIMIT 10 }), and the results are visible in the outer query. (If you’re familiar with SQL, you’ll have to rethink the concept a bit – SPARQL subqueries are purely “bottom-up” and can’t use values from the outer query, like SQL “correlated subqueries” can.)
  • MINUS lets you select results that don’t fit some graph pattern. FILTER NOT EXISTS is mostly equivalent (see the SPARQL spec for an example where they differ), but – at least on WDQS – usually slower by quite a bit.

Your main reference for these and other topics is the SPARQL specification.

Also, you can take a look at SPARQL tutorial on Wikibooks and this tutorial by data.world.

And of course, there are some parts of Wikidata still missing as well, such as references, numeric precision (100±2.5), values with units (two kilograms), geocoordinates, sitelinks, statements on properties, and more. You can see how those are modeled as triples under mw:Wikibase/Indexing/RDF Dump Format.

参见