Wikidata:봇 만들기
이 페이지는 위키데이터에서 사용할 봇을 어떻게 만드는지에 대하여 다룹니다. 당신의 코드를 공유하여, 새로운 기능의 추가·개선을 부탁드립니다.
요구 사양
봇을 만들기 위해 당신에게 필요한 것:
- 약간의 코딩 기술(Python, Perl, PHP...)
- 프레임워크(다음의 것들 중 하나) 그리고, 작동을 위한 코드
- 봇 계정(과 봇 권한 승인)
- 소스 코드 편집기(Notepad++, Geany, vi, emacs)
Recommendation
- Join a Wikidata telegram channel and participate in the discussions (and ask for help if you get stuck programming).
Pywikibot
경고: This bot framework has incomplete support of lexemes as of June 2022. See other libraries below for full support. |
다음 항목에서는 어떻게 설치하는지에 대해서와 pywikibot의 구성에 대해서 다룹니다. 당신은 한 번만 이 과정을 거치면 됩니다. 또한 '기본 봇 프로그래밍'에 대한 기본적인 몇 가지 예제가 있습니다.
설치
pywikibot 설치에 대한 더 자세한 사항에 대해서는 mw:Manual:Pywikibot/Installation을 참고하세요.
pywikibot 설치하기:
- 설치하기 Python (Python v2.7.2 또는 그 이상, v3.3.0 또는 그 이상이 필요)
- pywikibot 내려받기:
- 다운로드 링크 zip file
- 또는 저장소에서 받기:
- 또는 다른 저장소 mw:Manual:Pywikibot/Installation
윈도우 사용시: TortoiseSVN. 이것을 설치하고, 다음 저장소를 이용하세요.http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia/우분투 사용시: Subversion (sudo apt-get install subversion). 종류 확인하기: svn checkout http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia/ pywikipedia
환경 설정
- pywikibot 환경 설정에 대한 더 자세한 사항에 대해서는 mw:Manual:Pywikibot/Installation을 참고하세요.
반드시user-config.py을 봇 사용자이름, 패밀리 프로젝트(family project)와 언어로 만들어야 합니다. 위키데이터에서는 페밀리 프로젝트와 언어가 같습니다, wikidata
.
Extended content |
---|
mylang = "wikidata"
family = "wikidata"
usernames["wikidata"]["wikidata"] = 'MyBotName'
|
당신은 두 편집의 간격을 줄일 수 있습니다: put_throttle = 1 put_throttle = 1
로그인
나중에 당신은 user-config.py파일에서 다음과 같이 환경 설정을 할 수 있습니다. :
$ python login.py
봇 암호를 물어보면, 입력하고 엔터키를 누릅니다. 당신이 올바르게 진행했다면, 로그인이 가능합니다.
Example 1: Get data
This example gets data for the page refering to Douglas Adams. Save the following source code in a file and execute it with python example1.py
Extended content |
---|
import pywikibot
site = pywikibot.Site("en", "wikipedia")
page = pywikibot.Page(site, 'Douglas Adams')
item = pywikibot.ItemPage.fromPage(page)
dictionary = item.get()
print(dictionary)
print(dictionary.keys())
print(item)
|
item.get()
는 Wikidata에 연결하고 데이터를 가져옵니다. 출력은 다음과 같습니다 (명확성을 위해 다시 형식화 됨).
{ 'claims': { 'P646': [<pywikibot.page.Claim instance at 0x7f1880188b48>], 'P800': [<pywikibot.page.Claim instance at 0x7f1880188488>, <pywikibot.page.Claim instance at 0x7f1880188368>] ... } 'labels': { 'gu': '\u0aa1\u0a97\u0acd\u0ab2\u0abe\u0ab8 \u0a8f\u0aa1\u0aae\u0acd\u0ab8', 'scn': 'Douglas Adams', ... } 'sitelinks': { 'fiwiki': 'Douglas Adams', 'fawiki': '\u062f\u0627\u06af\u0644\u0627\u0633 \u0622\u062f\u0627\u0645\u0632', 'elwikiquote': '\u039d\u03c4\u03ac\u03b3\u03ba\u03bb\u03b1\u03c2 \u0386\u03bd\u03c4\u03b1\u03bc\u03c2', ... } 'descriptions': { 'eo': 'angla a\u016dtoro de sciencfikcio-romanoj kaj humoristo', 'en': 'English writer and humorist', }, 'aliases': { 'ru': ['\u0410\u0434\u0430\u043c\u0441, \u0414\u0443\u0433\u043b\u0430\u0441'], 'fr': ['Douglas Noel Adams', 'Douglas No\xebl Adams'], ... } } ['claims', 'labels', 'sitelinks', 'descriptions', 'aliases'] [[wikidata:Q42]]
키와 함께 사전을 인쇄합니다.
- 페이지의 클레임 집합 : Property:P646은 Freebase 식별자, Property:P800은 "주목할 작업"등입니다.
- 여러 언어로 된 항목의 레이블
- 항목에 대한 사이트 링크, 여러 언어로 된 Wikipedia뿐만 아니라 여러 언어로 된 Wikiquote
- 여러 언어로 된 항목 설명
- 여러 언어로 된 항목의 별칭
그런 다음 사전에있는 키-값 쌍에 대한 모든 키가있는 목록입니다. 마지막으로 Douglas Adams에 대한 Wikidata 항목이 Q42임을 알 수 있습니다.
대안
위의 예제는 en wikipedia 기사를 사용하여 ItemPage를 가져옵니다. 또는 ItemPage를 직접 가져올 수도 있습니다.
Extended content |
---|
import pywikibot
site = pywikibot.Site("wikidata", "wikidata")
repo = site.data_repository()
item = pywikibot.ItemPage(repo, 'Q42')
|
예 2: 위키 간 링크 가져 오기
예를 들어 item.get() 뒤에 사이트 링크에 액세스 할 수 있습니다. 기사가있는 모든 위키 백과에 대한 링크입니다.
Extended content |
---|
import pywikibot
site = pywikibot.Site("wikidata", "wikidata")
repo = site.data_repository()
item = pywikibot.ItemPage(repo, 'Q42')
item.get()
print(",".join(item.sitelinks))
|
출력은 다음과 같습니다.
{'fiwiki': 'Douglas Adams', 'eowiki': 'Douglas Adams', 'dewiki': 'Douglas Adams', ...}
With item.iterlinks(), an iterator over all these sitelinks is returned, where each article is given not as plain text as above but already as a Page object for further treatment (e.g., edit the text in the corresponding Wikipedia articles).
Example 4: Set a description
This example sets an English and a German description for the item about Douglas Adams.
Extended content |
---|
import pywikibot
site = pywikibot.Site("wikidata", "wikidata")
repo = site.data_repository()
item = pywikibot.ItemPage(repo, 'Q42')
item.get()
mydescriptions = {'en': 'English writer and humorist', 'de': 'Keine Panik!'}
item.editDescriptions(mydescriptions, summary='Setting/updating descriptions.')
|
Setting labels and aliases works accordingly.
Example 6: Set a sitelink
To set a sitelink, we can either create a corresponding dict corresponding to Example 4 or use Page objects:
Extended content |
---|
import pywikibot
site = pywikibot.Site("en", "wikipedia")
repo = site.data_repository()
item = pywikibot.ItemPage(repo, 'Q42')
page = pywikibot.Page(site, 'Douglas Adams')
item.setSitelink(page, summary='Setting (/updating?) sitelink.')
|
Example 7: Set a statement
Statements are set using the Claim class. In the following, we set for Douglas Adams place of birth (P19): Cambridge (Q350).
Extended content |
---|
import pywikibot
site = pywikibot.Site("wikidata", "wikidata")
repo = site.data_repository()
item = pywikibot.ItemPage(repo, 'Q42')
claim = pywikibot.Claim(repo, 'P19')
target = pywikibot.ItemPage(repo, 'Q350')
claim.setTarget(target)
item.addClaim(claim, summary='Adding claim')
|
For other datatypes, this works similar. In the following, we add claims with string (IMDb ID (P345)) and coordinate (coordinate location (P625)) datatypes (URL is the same as string):
Extended content |
---|
import pywikibot
site = pywikibot.Site("wikidata", "wikidata")
repo = site.data_repository()
item = pywikibot.ItemPage(repo, 'Q42')
stringclaim = pywikibot.Claim(repo, 'P345')
stringclaim.setTarget('nm0010930')
item.addClaim(stringclaim, summary='Adding string claim')
coordinateclaim = pywikibot.Claim(repo, 'P625')
coordinate = pywikibot.Coordinate(lat=52.208, lon=0.1225, precision=0.001, site=site)
coordinateclaim.setTarget(coordinate)
item.addClaim(coordinateclaim, summary='Adding coordinate claim')
|
Example 8: Add a qualifier
Qualifiers are also represented by the Claim class. In the following, we add the qualifier incertae sedis (P678): family (Q35409) to the Claim "claim". Make sure you add the item before adding the qualifier.
Extended content |
---|
qualifier = pywikibot.Claim(repo, 'P678')
target = pywikibot.ItemPage(repo, "Q35409")
qualifier.setTarget(target)
claim.addQualifier(qualifier, summary='Adding a qualifier.')
|
Example 9: Add a source
또한 소스는 Claim 클래스로 표시됩니다. 한정자와 달리 소스에는 둘 이상의 클레임이 포함될 수 있습니다. 다음에서 stated in (P248) : Integrated Taxonomic Information System (Q82575)를 retrieved (P813)에서 2014 년 3 월 20 일 클레임 "소유권 주장"의 출처로 추가합니다. 클레임은 Wikidata에서 검색하거나 미리 항목 페이지에 추가해야합니다.
Extended content |
---|
statedin = pywikibot.Claim(repo, 'P248')
itis = pywikibot.ItemPage(repo, "Q82575")
statedin.setTarget(itis)
retrieved = pywikibot.Claim(repo, 'P813')
date = pywikibot.WbTime(year=2014, month=3, day=20)
retrieved.setTarget(date)
claim.addSources([statedin, retrieved], summary='Adding sources.')
|
Example 10: Page generators
TODO
Example 11: Get values of sub-properties
In the following, we get values of sub-properties from branch described by source (P1343) -> Great Soviet Encyclopedia (1969–1978) (Q17378135) -> properties reference URL (P854) and title (P1476).
Extended content |
---|
import pywikibot
site = pywikibot.Site("wikidata", "wikidata")
repo = site.data_repository()
item = pywikibot.ItemPage(repo, 'Q13515')
item.get()
sourcesid = 'P1343'
sourceid = 'Q17378135'
urlid = 'P854'
nameid = 'P1476'
# item.claims['P1343'][1].qualifiers.items(): # This are direct way to get list qualifiers. But '[1]' is hard link to index of list, it will break over time.
if sourcesid in item.claims:
for source in item.claims[sourcesid]:
if source.target.id == sourceid:
s = source.qualifiers
if urlid in s: url = s.get(urlid)[0].target
if nameid in s: name = s.get(nameid)[0].target['text']
print (url, name)
|
More examples
Some users share their source codes. Learn more in the next links:
- User:RobotMichiel1972/wikidata lowercase.py - pywikipedia example how you can correct the label to lowercase using the English label capitalization as 'reference' (here hard coded implemented for nlwiki only) running over selection of pages in own wikipedia.
- File:Bots hackathon 2013.pdf presenting "claimit.py" and "template_harvest.py" included in the core version (former re-write).
Wikidata Integrator
WikidataIntegrator is a library for reading and writing to Wikidata/Wikibase. We created it for populating Wikidata with content from authoritative resources on Genes, Proteins, Diseases, Drugs and others. Details on the different tasks can be found on the bot's Wikidata page.
Pywikibot is an existing framework for interacting with the MediaWiki API. The reason why we came up with our own solution is that we need a high integration with the Wikidata SPARQL endpoint in order to ensure data consistency (duplicate checks, consistency checks, correct item selection, etc.). Compared to Pywikibot, WikidataIntegrator currently is not a full Python wrapper for the MediaWiki API but is solely focused on providing an easy means to generate Python-based Wikidata bots.
For more information, documentation, download & installation instructions, see here: https://github.com/SuLab/WikidataIntegrator/
Example Notebook
An example notebook demonstrating an example bot to add therapeutic areas to drug items, including using fastrun mode, checking references, and removing old statements:
http://public-paws.wmcloud.org/46883698/example%20ema%20bot.ipynb
WikibaseIntegrator
Forked from Wikidata Integrator by User:Myst in 2020 and has seen several improvements to the API that makes it even easier to create bots using the library.
For more information, documentation, download & installation instructions, see here: https://github.com/LeMyst/WikibaseIntegrator
Example semi-automatic script
LexUse semi-automatic tool for finding and adding usage examples to lexemes. It's free software written using Python 3 in 2020 Wikidata:LexUse.
Wikibase.NET (Deprecated)
Wikibase.NET is the api that replaces the now deprecated DotNetDataBot. Api client for the MediaWiki extension Wikibase. They aren't compatible because Wikibase.NET does no longer need the DotNetWikiBot framework.
Download & Installation
You can download Wikibase.NET from GitHub. Just follow the instructions on that page.
Known issues
Examples
Coming not soon...
DotNetDataBot (Deprecated)
Installation
- Download: DotNetDataBot
Configuration
After unpacking the package you can see a file called DotNetDataBot.dll and one called DotNetDataBot.xml. The xml document is only for documentation. To use it you have to create a new refer in your project. Then you can write using DotNetDataBot;
to import the framework.
Login
To login you have to create a new Site
object with the url of the wiki, your bot's username and its password.
Extended content |
---|
C# using DotNetDataBot;
public static void Main()
{
Site wikidata = new Site("http://www.wikidata.org", "User", "Password");
}
|
Example 1: Get id using wiki page
You can access the id of an item by searching for using the site and the title of the connected page.
Extended content |
---|
C# using DotNetDataBot;
public static void Main()
{
Site site = new Site("http://www.wikidata.org", "User", "Password");
Item item = new Item(site);
if (item.itemExists("it", "George Lucas")) // Check if exist on Wikidata
{
Console.Write("Q" + item.GetIdBySitelink("it", "George Lucas"));
}
else
{
Console.Write("Doesn't exist");
}
}
|
Example 2: Get interwiki links
You can get the interwiki links of an item by loading the content and accessing the links
field of the object.
Extended content |
---|
C# using DotNetDataBot;
public static void Main()
{
Site site = new Site("http://www.wikidata.org", "User", "Password");
Item item = new Item(site);
item.id = item.GetIdBySitelink("it", "George Lucas");
item.Load();
foreach(KeyValuePair<string, string> link in item.links)
{
Console.Write(link.Key); // lang (eg. en or it)
Console.Write(link.Value); // page (eg. George_Lucas)
}
}
|
Example 3: Set a description
To set a description, you must call the setDescription
function.
Extended content |
---|
C# using DotNetDataBot;
public static void Main()
{
Site site = new Site("http://www.wikidata.org", "User", "Password");
Item item = new Item(site, "Q4115189");
item.Load();
if (item.descriptions.ContainsKey("it")) // if alredy exist description in italian
{
// Nothing to do
}
else
{
item.setDescription("it", "description in italian", "Bot: Add italian description");
}
}
|
Example 4: Set a label
It works the same way for setting a label. Just call setLabel
.
Extended content |
---|
C# using DotNetDataBot;
public static void Main()
{
Site site = new Site("http://www.wikidata.org", "User", "Password");
Item item = new Item(site, "Q4115189");
item.Load();
if (item.labels.ContainsKey("it")) // if alredy exist label in italian
{
// Nothing to do
}
else
{
item.setLabel("it", "label in italian", "Bot: Add italian label");
}
}
|
Example 5: Get interwiki links for 100 pages
This feature is not supported. Just iterate over the list.
Wikibase api for PHP
This is an api client for Wikibase written in PHP. It can be downloaded from here.
Example 1: Basic example
Take a look at the source comments to understand how it works.
Extended content |
---|
<?php
/**
* Basic example for the use of the libary with some small edits
*/
require_once( __DIR__ . "/vendor/autoload.php" );
// Creates some useful objects and logs into the api
$api = new \Mediawiki\Api\MediawikiApi( "http://www.wikidata.org/w/api.php" );
$api->login( new \Mediawiki\Api\ApiUser( 'username', 'password' ) );
$dataValueClasses = array(
'unknown' => 'DataValues\UnknownValue',
'string' => 'DataValues\StringValue',
);
$wikidata = new \Wikibase\Api\WikibaseFactory(
$api,
new DataValues\Deserializers\DataValueDeserializer( $dataValueClasses ),
new DataValues\Serializers\DataValueSerializer()
);
// Gets the current revision for item Q777
$revision = $wikidata->newRevisionGetter()->getFromId( 'Q777' );
$item = $revision->getContent()->getData();
// Outputs the current sitelink for enwiki
var_dump( $item->getSiteLink( 'enwiki' ) );
// Sets the de description to 'Foobar'
$item->getFingerprint()->setDescription( 'de', 'Foobar' );
// Saves the item
$wikidata->newRevisionSaver()->save( $revision );
//Log out
$api->logout();
|
Example 2: Creating claims
Take a look at the source comments to understand how it works.
Extended content |
---|
<?php
/**
* Basic example for the use of the library with some small edits.
*/
require_once( __DIR__ . "/vendor/autoload.php" );
// Creates some useful objects and logs into the api
$api = new \Mediawiki\Api\MediawikiApi( "https://www.wikidata.org/w/api.php" );
$api->login( new \Mediawiki\Api\ApiUser( 'username', 'password' ) );
$dataValueClasses = array(
'unknown' => 'DataValues\UnknownValue',
'string' => 'DataValues\StringValue',
);
$services = new \Wikibase\Api\WikibaseFactory(
$api,
new DataValues\Deserializers\DataValueDeserializer( $dataValueClasses ),
new DataValues\Serializers\DataValueSerializer()
);
$revision = $services->newRevisionGetter()->getFromId( 'Q777' );
$item = $revision->getContent()->getData();
$statementList = $item->getStatements();
if( $statementList->getByPropertyId( \Wikibase\DataModel\Entity\PropertyId::newFromNumber( 1320 ) )->isEmpty() ) {
$services->newStatementCreator()->create(
new \Wikibase\DataModel\Snak\PropertyValueSnak(
\Wikibase\DataModel\Entity\PropertyId::newFromNumber( 1320 ),
new \DataValues\StringValue( 'New String Value' )
),
'Q777'
);
}
// Log out
$api->logout();
|
VBot (no updates since 2017)
Framework for Wikidata and Wikipedia. Read and write on Wikidata and other Wikimedia project and have a useful list generator to generate list of Wikipedia page and Wikidata entity. Can read also JSON dump of Wikidata.
Overview
Bot to read and edit Wikidata and Wikipedia.
- License: CC0 1.0
- Language C#
- Can read and write entities with all datatype on Wikidata
- Can read and write pages on all Wiki project
- Can read parameter from template on wiki pages
- Can read JSON dump
- Can create lists using:
- Wikidata query
- Catscan 2
- Quick intersection
- What Links Here on Wikidata
- Tested with Visual Studio Express 2013 for Windows Desktop.
- Is necessary to have Newtonsoft.Json. You can install it with NuGet inside Visual Studio
- Is necessary to add manually a reference to System.Web for "HttpUtility.UrlEncode"
Download
The framework can be downloaded from GitHub here.
Instruction
- Wiki (partial)
Example 1
Update en label for all items with instance of (P31): short film (Q24862) that have director (P57) and that have publication date (P577) in 1908. (Use of Wikidata query)
Extended content |
---|
private void CompleteExample()
{
//Wikidata query
string strWDQ = "CLAIM[31:24862] AND CLAIM[57] AND BETWEEN[577,+00000001908-00-00T00:00:00Z,+00000001908-12-31T00:00:00Z]";
ListGenerator lg = new ListGenerator();
List<string> chunks = lg.WDQ(strWDQ, 50);
//Connection to Wikipedia
WikimediaAPI WP = new WikimediaAPI("https://it.wikipedia.org", User, Password);
Pages PageList = new Pages();
//Connection to Wikidata
WikimediaAPI WD = new WikimediaAPI("https://www.wikidata.org", User, Password);
Entities EntityList = new Entities();
Dictionary<string, string> Labels = new Dictionary<string, string>();
foreach (string list in chunks)
{
// Load all entity of the chunk
string strJson = WD.LoadWD(list);
EntityList = new Entities();
EntityList = JsonConvert.DeserializeObject<Entities>(strJson, new DatavalueConverter());
foreach (KeyValuePair<string, Entity> entity in EntityList.entities)
{
if (entity.Value.sitelinks.ContainsKey("itwiki"))
{
// Load Wikipage
string Pages = WP.LoadWP(entity.Value.sitelinks["itwiki"].title);
PageList = JsonConvert.DeserializeObject<Pages>(Pages, new DatavalueConverter());
//Director from template
string director = Utility.GetTemplateParameter(PageList.query.FirstPageText, "film","Regista").Replace("[","").Replace("]", "");
Labels = new Dictionary<string, string>();
if (director=="")
{
Labels.Add("en", "1908 short movie");
}
else
{
Labels.Add("en", "1908 short movie directed by " + director);
}
// Update Wikidata
WD.EditEntity(entity.Value.id, null, Labels, null, null, null, "BOT: Update en label");
}
}
}
}
|
LexData (Python; for Lexicographical data)
LexData is an easy to use python libary to create and edit Lexemes, Senses and Forms.
Tips
The documentation of LexData is still a bit lacking so look at existing implementations in MachtSinn or Wikdata Lexeme Forms for ideas how to use it.
If you only want to add statements to Lexemes (not forms or senses) WikibaseIntegrator might be a better choice, as it is more versatile and support a lot of data types.
Installation
You can install LexData via pip:
$ pip install LexData
Login
For all operations you need a WikidataSession
. You can create it with your credentials, a bot password or an Edit Token (for example to edit via OAUTH):
Extended content |
---|
repo = LexData.WikidataSession("YourUsername", "YourPassword")
|
Retrieve a Lexeme
You can open existing Lexemes and read their content.
Extended content |
---|
L2 = LexData.Lexeme(repo, "L2")
print(L2.claims)
print(L2.forms)
print(L2.senses)
sense1 = L2.senses[0]
print(sense1.claims)
|
Searching and creating Lexemes
If you don't know the L-Id of a lexeme you can search for it. And if it doesn't exist you can create it.
Extended content |
---|
# Find an existing Lexeme by lemma, language and grammatical form
L2 = LexData.search_lexemes(repo, "first", en, "Q1084")
# Create a new Lexeme
L2 = LexData.create_lexeme(repo, "first", en, "Q1084")
# Find or create a Lexeme
L2 = LexData.get_or_create_lexeme(repo, "first", en, "Q1084")
|
Adding information
You can easily create forms or senses, with or without additional claims:
Extended content |
---|
if len(L2.forms) == 0:
L2.createForm("firsts", ["Q146786"])
if len(L2.senses) == 0:
L2.createSense(
{
"en": "Element in an ordered list which comes before all others according to the ordering",
"de": "einer Ordnung folgend das Element vor allen anderen",
},
claims={"P5137": ["Q19269277"]},
)
|
Using Wikidata's API directly
The other sections describe how to use bot frameworks to access and update Wikidata information.
You can also directly interact with the Wikibase API that Wikidata provides.
You need to do this if you're developing your own framework or if you need to do something that a framework doesn't support.
The documentation for the Wikibase API can be found at mediawiki.org. You can also play around with it at Special:ApiSandbox, try action=wbgetentities
.
Wikibase provides its API as a set of modules for MediaWiki's "action" API. You access this by making HTTP requests to /w/api.php
.
The default response format is JSON.
So for your language of choice, you only need a library to perform HTTP requests and a JSON or XML library to parse the responses.
Example 1: Get Q number
This example gets the item Q number for the English Wikipedia article about Andromeda Galaxy. The Wikibase API's main "workhorse" module action=wbgetentities
provides this information. The HTTP request (using jsonfm
format for human-readable JSON output) is simply
Try following the link. This requests no additional information about the entity; remove &props=
from the URL to see much more information about it.
See the generated help for wbgetentities
for more parameters you can specify.
Python
Extended content |
---|
#!/usr/bin/python3
from requests import get
def get_qnumber(wikiarticle, wikisite):
resp = get('https://www.wikidata.org/w/api.php', {
'action': 'wbgetentities',
'titles': wikiarticle,
'sites': wikisite,
'props': '',
'format': 'json'
}).json()
return list(resp['entities'])[0]
print(get_qnumber(wikiarticle="Andromeda Galaxy", wikisite="enwiki"))
|
The output is:
Q2469
Example 2: Get list of items without particular interwiki
...please contribute if you know how...
같이 보기
- mw:Wikidata Toolkit Java framework
- Wikidata:Bots