User talk:Magnus Manske

Jump to: navigation, search

About this board

Previous discussion was archived at User talk:Magnus Manske/Archive 9 on 2015-08-10.

By clicking "Add topic", you agree to our Terms of Use and agree to irrevocably release your text under the CC BY-SA 3.0 License and GFDL

Mix'n'Match : matching items from different sources, without item

5
Hsarrazin (talkcontribs)

Hello, sometimes, on M'n'M, in Creation candidates mode, I find entries (with no existing wd item), that can be matched between catalogs (often artists).

Is there a way to link these entries, so that when the item will be created (and IDs added), all the entries can be linked to the same item ? (checkbox ?) Ex : Lambert Marshall (1810-1870) has 5 entries (ULAN/NPD/RKD/BMT/YourPaintings) and no item. It would be nice to create the item autolinking it to the 5 entries in one swift click ;D

Multichill (talkcontribs)

I could probably modify http://tools.wmflabs.org/multichill/painters/ so that you can just pass a name and that one is the only one showing up. Would that help?

Hsarrazin (talkcontribs)

@Multichill:

I don't understand what you mean : my question was not only for the work with creating artists for your list of paintings, but, in general : a way to link ids to add them to an item, or create a new item… and Lambert Marshall" is not on it.

as for looking for a specific painter, ctrl-f is ok with me.

Or... I don't understand at all how your tool works, which is very possible ;D — I just use it to create painters, once I checked they are not already existant.

One very useful thing would be to use your tool to batch-add the IDs to an existing item, as many alredy exist :)

Vladimir Alexiev (talkcontribs)

@Hsarrazin: you cant correlate two items before you have WD entry for it: nowhere to put the data. why not create WD entry?

Hsarrazin (talkcontribs)

@Vladimir Alexiev : my question was about some way to click and join entries to put them directly on created item, instead of creating empty item, and then adding entries one by one :)

something like checkbox near each entry... to select the right ones to add to new item :)

Reply to "Mix'n'Match : matching items from different sources, without item"
Multichill (talkcontribs)

Hi Magnus, I created Artsy artist (P2042). Artsy is already in mix'n'match, but missing the property. Can you add it so the collected data can be imported to Wikidata? Thanks

Magnus Manske (talkcontribs)

Done, syncing now.

Multichill (talkcontribs)

Great. Thank you.

Reply to "Artsy artist"

QuickIntersection v2 - get the corresponding Wikidata item for Commons

8
Jura1 (talkcontribs)

The tool currently throws an error message.

BTW, when it works, the option "Get the corresponding Wikidata item for the individual page" doesn't work for Commons.

Jura1 (talkcontribs)

Just noticed that version 3 works: http://tools.wmflabs.org/catscan3/quick_intersection.php but not version 2: http://tools.wmflabs.org/catscan2/quick_intersection.php

Jura1 (talkcontribs)

v2 works again. The Wikidata item would be helpful.

Magnus Manske (talkcontribs)

Works for me.

Jura1 (talkcontribs)

I know it does for Wikipedia, but not for Commons sample

Jura1 (talkcontribs)

in fact no: it works for ns 0, but not ns 14

Jura1 (talkcontribs)

ns 0 works at Wikipedia, but not at Commons

ns 14 doesn't work at both

Jura1 (talkcontribs)

For ns0, it works for Commons when using just "wiki" or "wikipedia" as site name

A quick patch for ns:14 could be to add "Category:" to the page title when the language = "commons" and ns=14 possibly somewhere here

Reply to "QuickIntersection v2 - get the corresponding Wikidata item for Commons"
Hsarrazin (talkcontribs)

Do you think it would be possible to import NGV artists in Mix'n'Match https://www.ngv.vic.gov.au/explore/collection/artist/?surname=a

a lot of Australian and NZ artists, which could complete other artists databases.  :)

Multichill (talkcontribs)

Hi Hsarrazin, I considered proposing a new property for this, but didn't do it yet. Did it right now at the proposal page. I guess Magnus can add it to mix'n'match when the property id is known.

Hsarrazin (talkcontribs)

Hi Multichill, and Magnus

in fact, a lot of museums those paintings come from have interesting databases, but the most interesting are those that propose artists not know in other places, like Finland, Denmark, Australia/NZ, etc :)

maybe we should collect those as we work on paintings, and make a list ?

Multichill (talkcontribs)

If I come across something that is a good addition (and not just more overlap with what we already have), I usually talk with some of the other people who work on art and propose it.

Multichill (talkcontribs)

Bump, I just created National Gallery of Victoria artist identifier (P2041). Could you add it to Mix'n'Match Magnus? Thank you.

Reply to "Mix'n'Match : NGV artists?"
Sjoerddebruin (talkcontribs)

The following ID's are missing:

  • 913
  • 1804
  • 470

They show up in the descriptions of other entries.

Reply to "Mix'n'match - CPAG, missing entries"

Listeria: query string for columns/sections

6
Summary by Sjoerddebruin

Going for separate lists.

Sjoerddebruin (talkcontribs)

I want to make lists for Dutch mayors, but so easy as possible. Would it be possible to add a option to filter the output of a column? See User:Sjoerddebruin/Dutch politics/Mayors, I want to hide the other positions.

Jura1 (talkcontribs)

Sometimes the reverse is interesting (what other offices did they hold).

Sjoerddebruin (talkcontribs)

Yeah, but for example: how can I fetch the tenure of the mayorship in this kind of lists...

Jura1 (talkcontribs)

Currently this probably only works if you do a separate list for a specific value in position held (P39), such as the VT one. If you use only mayor (Q13423499) (instead of Mayor of Almere (Q15731141)), it could work.

Sjoerddebruin (talkcontribs)

Yup, I have User:Sjoerddebruin/Dutch politics/Mayors/Amsterdam but I wish it was easier to do this.

Jura1 (talkcontribs)

That's already a very long list ..

Multichill (talkcontribs)

Hi Magnus, as you know we now have over 100.000 paintings here on Wikidata. Quite a few of these paintings (around 20.000) are not linked to their creator yet. This number was higher, but I ran a bot that matches the painting to the painter based on the "painting by <someone>". I'm now coming to the point that I don't seem to be able to match much more. I was wondering if it would be possible to cross reference some data sets and come up with good suggestions on who to create. Take for example Carl Newman (Q20821502), RKDartists, ULAN and SAAM all had an entry for him. I also made this list of unmatched painters (replace .txt with .sql for the query). Do you feel like having a shot at this?

Hsarrazin (talkcontribs)

Hello, sorry to intrude (I've worked for some time on creators on Commons)

J. M. W. Turner (Q159758), Thomas Girtin (Q714243), Christoffer Wilhelm Eckersberg (Q363823), no label (Q12899967), P. C. Skovgaard (Q2761358), Karl Anders Ekman (Q16171453), and probably many more, already exist on wikidata...

@Multichill: : do you need help to find them all ?

Multichill (talkcontribs)

Sure, some stuff in that list can be matched, matching with other datasets should turn that up too. The Turner/Girtin stuff is The Bank of a Lake or River with Hills Beyond (Q18571754) (two authors) that's something that I can't handle right now. If you see things in the list for which you know the Wikidata item: Please add the name as a English label or alias (example). So yes, help much appreciated!

Hsarrazin (talkcontribs)

I just launched The Turner/Girtin stuff with Autolist, based on a claim request completed by the "painting by Joseph Mallord William Turner, Thomas Girtin" on en label (117 items). Tell me if it's alright, or not :)

Magnus Manske (talkcontribs)

@Multichill (how do I ping people in this?), made you a new tool: http://tools.wmflabs.org/wikidata-todo/relabel.php Have fun :-)

Hsarrazin (talkcontribs)

I think you don't need to ping the person who began the thread, it's automatically in his follow list, like it is in mine, too.... ;)

Magnus Manske (talkcontribs)

@Multichill Ah, sorry, I see your is already a DB query result. Not quite sure what the question is here. Use the tool to get some good estimates for your unmatched painters, need to be checked manually though.

Is the question who to make Creator: entries on Commons for? If so, why not all of them? As in, the ones that have a Wikidata item?

Hsarrazin (talkcontribs)

Hello, except for Mästare av Moskvaskolan and Mästare av Novgorodskolan (that are swedish "Master of…"), all painters from top to Berndt Abraham Godenhjelm exist and have appropriate aliases.

Have fun.

Multichill (talkcontribs)

I hacked up a webtool to easily create the missing painters. You can play around with it at http://tools.wmflabs.org/multichill/painters/

Vladimir Alexiev (talkcontribs)

Is this the right place to discuss your tool? https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_paintings#Multichill.27s_tool If not, move that section where it belongs

Multichill (talkcontribs)

No, it doesn't belong there. Moved it to the talk page.

Reply to "Matching creators"
Multichill (talkcontribs)

So this property was created some ago and nobody noticed. Jane even proposed it's creating early this week and I supported that because we both didn't know about this property. Upside is that we won't have to wait a week. Could you add these artworks to mix'n'match? Probably best to rename the existing catalog to something like "BBC Your Paintings artists" to avoid confusion. The logic is probably the same as with the painters, only way more content.

Jheald (talkcontribs)

@Multichill, Jane023, Pigsonthewing, Charles Matthews: I suggest it might be worth thinking a bit about what's going to be the best way to go forwards with this, before jumping straight in to creating a catalogue on mix'n'match.

There are currently 212,000 paintings on Your Paintings -- this includes some by minor artists and in minor collections. We might ignore for example any by artists that have been marked Not Appropriate for Wikidata in the current Mix'n'match run. But it's still a very large number of items -- probably well over 150,000, since the minor artists tend to be represented by only a few canvases at most.

In contrast, I've run a few back-of-an-envelope searches at Wikidata:WikiProject UK and Ireland/paintings. Currently we have items for 4865 paintings marked as either located in the UK or from collections in the UK. This total is dominated by works by Turner in the Tate -- 3061 according to Autolist, added in bulk by (talkcontribslogs). Some of these may not be oil paintings (I didn't filter by material used (P186) because coverage is not particularly brilliant); but in contrast Your Paintings only has 392 canvases by Turner.

Excluding the extra Turners takes the number of potential matches down to 2196.

That's still not an exact estimate, because on the one hand some of those items are not oil paintings, whereas YP only contains oil paintings; while on the other hand, there may be some further items that are in the YP list, but didn't make my count because they didn't have either a collection or a location set. (Though the number may not be huge, because in total we only have 2277 items marked as paintings with no collection set (Autolist) -- but I suppose there may be further items not even marked as paintings. I guess assessing that depends on how comprehensively the category system has been mined, to set P31=painting. @Multichill:, perhaps do you have a handle on that?)

Anyhow, the point I'm making is that using Mix'n'match to try to match from YP into Wikidata, we'd be looking at a hit rate of only about 1.5%, which is miserable.

Matching from Wikidata into YP is probably a better strategy, but even then some care is needed -- for example, here are 22 hits for "Bacchus and Aradne", which would rather swamp a title-only Mix'n'match search.

To be specific, the search really needs to include the collection and the artist, as well as title keywords. I'm not sure Mix'n'match can do that, out of the box. Collection+Artist seems to be possible at the Your Paintings site, eg example, but then it's not clear whether title keywords can be added. (We would also need to start populating BBC Your Paintings collection identifier (P1751) (Autolist) and BBC Your Paintings venue identifier (P1602) (Autolist) -- which are not equivalent, eg the various branches of the Tate).

It may be that harvesting the painting names + collections + artist, and then a bespoke search (or an adaptation of Mix'n'match search) has to be the way to go.

But I'm a bit nervous about that too, this time for political reasons. The PCF licence is pretty restrictive, that its data is not reusable, with only individual collections able to authorise re-use. Which is accompanied by an "everybody go away" robots.txt file. Plus there's no legislated "fair use" provision in the EU Database directive. So there's a limit to how much we should lift without permission. Matching the PCF identifier might be legitimately de minimis, but it's questionable how much we should assimilate beyond that. On the other hand, if we did get permission, there is more in the PCF data we could add, eg media and dimensions, plus the institution itself in its own database might have additional valuable fields. And with permission (but only with permission), there would be no objection to wholesale creation of Wikidata items which did not already exist.

So the first best next step forward might be to try to get permissions for data re-use from some of the most important of the galleries and museums -- eg the National Gallery London, and some of the others near the top of the list here.

Jane023 (talkcontribs)

I have already started linking painting items using this property. Check "what links here" on the property. I can easily spot the paintings that I added to Wikipedia through their artist portfolio pages. Of course this only works for the big-ticket items, generally at the National Gallery and so forth. I suppose we could ask permission, but as far as I know, matching their identifier to material we already have is fine. Creating items is a potential problem. We already have painting items for most of the top pieces, so we should just link those up. Now that Maarten is creating painting items for top collections, we should be able to link up UK collections to the PCF identifiers. ~~~~

Vladimir Alexiev (talkcontribs)

I added info about the British Museum data at WikiProject sum of all paintings/Location/United Kingdom. But their license requires attribution.

Is the scope only Oil Paintings? I asked at WikiProject sum of all paintings#What is a Painting?

Reply to "BBC Your Paintings artwork identifier"
Jura1 (talkcontribs)

I tried to add Wikidata_list, but that doesn't seem to be sufficient. Would you kindly enable it?

Magnus Manske (talkcontribs)

Done.

Jura1 (talkcontribs)

Thanks. The category output would need a ":". Currently the report is added into all mentioned categories.

Jura1 (talkcontribs)

perfect! Thanks.