Wikidata:Partnerships and data imports

From Wikidata
Jump to navigation Jump to search
Partnerships and data imports
This page provides a space to discuss importing data from external sources and forming partnerships with external organisations.

You may find these related resources helpful:

High-contrast-document-save.svg Dataset Imports    High-contrast-view-refresh.svg Why import data into Wikidata.    Light-Bulb by Till Teenck.svg Learn how to import data    Noun project 1248.svg Bot requests    Question Noun project 2185.svg Ask a data import question
Please take a look at the Wikidata frequently asked questions to see if your question has already been answered.
Also see status updates to keep up-to-date on important things around Wikidata.
IRC channel: #wikidata connect

Project
chat

Lexicographical
data

Administrators'
noticeboard

Development
team

Translators'
noticeboard

Request
a query

Requests
for deletions

Requests
for comment

Bot
requests

Requests
for permissions

Property
proposal

Properties
for deletion

Partnerships
and imports

Interwiki
conflicts

Bureaucrats'
noticeboard

Filing cabinet icon.svg
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 1 day and sections whose oldest comment is older than 30 days.
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2018/09.

Importing external identifiers from third party sites[edit]

In general what methods could be used to gather and mass import external identifiers into become property of wikidata items? Specifically I am talking about bgm.tv subject identifier (P5732) bilibili ID (P5733) Moegirlpedia Chinese Article Entry (P5737) .C933103 (talk) 15:40, 2 September 2018 (UTC)}}

Have you checked Wikidata:Data Import Guide, which I believe someone linked in Wikidata:project chat? --Nemo 16:42, 2 September 2018 (UTC)
Yes I have, however a main point of the instruction seems to be copying over information into spreadsheet, but I am not too sure in general how could it be done effectively when the information would be scattered on different webpages on target site.C933103 (talk) 17:18, 2 September 2018 (UTC)
@C933103: that requires "scraping" your source website first. There are various tools to do that, such as web browser extensions, or tools more oriented towards Wikidata, such as Mixn'match (see the corresponding blog post here: http://magnusmanske.de/wordpress/?p=494) − Pintoch (talk) 18:50, 2 September 2018 (UTC)
Something like this? https://tools.wmflabs.org/mix-n-match/#/catalog/1734 Seems like it stuck at Nan, what's wrong? (edit: it seems like it is now fixed... but..[see next section]) C933103 (talk) 04:12, 7 September 2018 (UTC)

bgm.tv subject identifier (P5732)[edit]

@Pintoch: I am trying to scrap ids from the site bgm.tv however I don't know how to make the tool give proper encoding. For instance, the webpage itself http://bgm.tv/subject/12 when scrapped by the scrapper, would display something like <a href="/subject/12" title="人形电脑天使心" property="v:itemreviewed">ちょびっツ</a> while the title and the tag content should be Unicode strings instead. Checking/unchecking the "UTF8-encode" checkbox change nothing. On the other hand if I scrap from http://api.bgm.tv/subject/12 then it would be display as "name":"\u3061\u3087\u3073\u3063\u30c4" Which used the \uxxxx scheme to encode unicode characters. How to decode those characters properly in both cases? C933103 (talk) 11:34, 7 September 2018 (UTC)

@C933103: That seems to be a problem with Unicode handling that you could report to Magnus Manske, who maintains that tool. − Pintoch (talk) 12:16, 7 September 2018 (UTC)

Prog Archives[edit]

Hello.

I see the website Prog Archives [1] isn't actually a valid identifier for Wikipedia pages.

I think it must be for progressive bands and albums, than Encyclopaedia Metallum [2] is valid for metal bands and albums.

What's your opinion about ?

--Astio k (talk) 20:07, 14 September 2018 (UTC)