User talk:Magnus Manske

Jump to navigation Jump to search

About this board

Previous discussion was archived at User talk:Magnus Manske/Archive 9 on 2015-08-10.

Gerwoman (talkcontribs)

Hi Magnus. This catalog is 100% unmatched. Do you know what is the problem? Thank you

Reply to "catalog/2171"
Милан Јелисавчић (talkcontribs)

Hi Magnus, I am implementing Listeria mostly on Serbian Wikipedia. I have noticed a possible error in Listeria. Namely, items are discarded the second time they should be presented in a list. You can see the SPARQL code is correct since wd:Q463896 is presented two times in the query (https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3Fimage%20%3FstartYear%20%3FendYear%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%7D%0A%20%20%3Fitem%20wdt%3AP39%20wd%3AQ11039843%20%3B%0A%20%20%20%20%20%20%20%20wdt%3AP31%20wd%3AQ5%20%3B%0A%20%20%20%20%20%20%20%20p%3AP39%20%3Fmandate_statement%20.%0A%20%20%0A%20%20%3Fmandate_statement%20ps%3AP39%20%3Fmandate%20.%0A%20%20%0A%20%20OPTIONAL%20%7B%20%3Fmandate_statement%20pq%3AP580%20%3FstartDate%20.%20%7D%20.%0A%20%20OPTIONAL%20%7B%20%3Fmandate_statement%20pq%3AP582%20%3FendDate%20.%20%7D%20.%0A%20%20OPTIONAL%20%7B%20%3Fitem%20wdt%3AP18%20%3Fimage%20.%20%7D%20.%0A%20%20%0A%20%20BIND%28str%28YEAR%28%3FstartDate%29%29%20as%20%3FstartYear%29%0A%20%20BIND%28str%28YEAR%28%3FendDate%29%29%20as%20%3FendYear%29%0A%20%20FILTER%20%28%3Fmandate%20%3D%20wd%3AQ11039843%29%0A%7D%20ORDER%20BY%20%3FstartDate).

However, it is not presented in the generated list from Listeria (https://sr.wikipedia.org/wiki/%D0%A1%D0%BF%D0%B8%D1%81%D0%B0%D0%BA_%D1%80%D0%B5%D0%BA%D1%82%D0%BE%D1%80%D0%B0_%D0%A3%D0%BD%D0%B8%D0%B2%D0%B5%D1%80%D0%B7%D0%B8%D1%82%D0%B5%D1%82%D0%B0_%D1%83_%D0%91%D0%B5%D0%BE%D0%B3%D1%80%D0%B0%D0%B4%D1%83). Could you give me some hint on how to fix this?

Милан Јелисавчић (talkcontribs)

Hi Magnus! I still do think this is a Listeria bot issue. Everything around the SPARQL code seems in order. Could you help me with this?

Reply to "Listeria bot"

Mix'n'Match for the norwegian national archive actors

9
Pmt (talkcontribs)

Good afternoon! I have an excel file containing all archive-actors (creators) of the National archive of Norway with liks to their Archives. Can you have a look on that file and see if it is possible to make a Mix'n'Match for this?

Pmt (talkcontribs)

PS itss an Excel file and i can send it to you If you want to look at it.

Magnus Manske (talkcontribs)
Pmt (talkcontribs)

Thank you, but where should the Excel file be stored then? Can i just take the Excel file or should it be changed to a CSV file and paste it into the fiel below the row;

Text needs to be UTF-8. You can either paste the tab-delmited text into the box (for shorter catalogs), or upload it as as tab-delimited file (max. 20MB).

what does UTF-8 means

Pmt (talkcontribs)

Maybe outside Mixn'Match but I have this error when using the Mix'n'Match importer. Do you have any suggestion?

Magnus Manske (talkcontribs)

Just point me to the Excel file then...

Pmt (talkcontribs)

After a holiday I would kindly like to ask where I can store the Excel file you are asking me to point at? Breg

Magnus Manske (talkcontribs)

Oh come on, there are a million places where you can upload an Excel file, and point me to the URL. Or make a Google Spreadsheet instead.

Pmt (talkcontribs)
Reply to "Mix'n'Match for the norwegian national archive actors"
Zcarstvnz (talkcontribs)

Thanks for taking the time to write the code for the banner search that I posted on the '''Project chat''' page yesterday. You went way beyond what I was expecting anyone to do, and I really appreciate your efforts. Best wishes for a great day! Zcarstvnz (talk) 09:53, 17 January 2019 (UTC)

Blackcat (talkcontribs)

Hello Magnus, I need help: I am using your tool in order to create some tables that are useful for controlling the properties on some items (see here). There's a problem: I just cannot get the qualifier for a given item. For example, as you can see, I am trying to extract the value P585 from the property P1352 for each item that is entity of "rugby union women's national team", but the column is blank and just don't know how to get it. Can you give me any hint please? thanks in advance, best regards.

Reply to "template:Wikidata list"
ArthurPSmith (talkcontribs)

Hi Magnus - will you be attending the Wikicite meeting in San Francisco the week after next? The final day is a "hack day" and I had some thoughts about potential improvements to SourceMD. Whether or not you're there, what would be a helpful process to implement improvements? In particular I was looking at the "new_resolve_authors" piece, which I'm not really sure of the status of, but it looks like it is close to doing something significant for author disambiguation that I've been wanting to work on for ages...

Magnus Manske (talkcontribs)

Hi! Sorry, I won't be in SF. That said, I'm always happy to talk about new/improved tools, time permitting. "new_resolve_authors" is a crude solution to a real problem, but I'm not sure the codebase should be continued; rather, I'd like to see a different setup, based on better clustering. Also, ideally, a bot that, in the background, creates new high-confidence author items, and changes the statements in the publications accordingly. Having access to high-quality author databases other than ORCID would be a boon.

ArthurPSmith (talkcontribs)

I don't think there's anything that compares to ORCID (VIAF and ISNI have some records for scholarly authors, but very incomplete from what I've seen). I expect there are internal publisher databases that have more useful data (like author email addresses) but I don't know how we get at those. The problem I'd like to address is handling authors from before ORCID existed, or who have never engaged with ORCID (and may now be deceased). I'd like to facilitate human curation, not sure if we're ready for bots to do this. My idea was related to just adding some simple features to help with deciding on what papers are associated with an author: I believe your tool right now looks at names of coauthors - which is actually a good start. I'd like to add in journal title, publication date, possibly citations and affiliations where we have any data on that.

The way I was thinking of proceeding was to clone the "new_resolve_authors" piece, edit it to just spit out a list of Quickstatements that can be run separately (rather than directly feeding it to the bot), and then try to recruit some people to test it and figure out ways to make it work better... Does that seem sensible? I guess I'll report back on how it goes...

Magnus Manske (talkcontribs)

Journal title etc. would work, especially for authors with "common" names, but it will miss out on some authorships (e.g. someone had a kitchen chat with you and puts you on their paper in some completely unrelated discipline/journal).

One reason I don't just start up QuickStatements myself is that after author item creation, the QID of that new item needs to be used as a value for the author statement changes. Theoretically that should work using the LAST keyword, but I'm not quite sure that works...

One could create the author item internally, and then run the rest in QuickStatements. Assuming there are no two identically named authors on that paper ;-)

Something that could improve "tasking" people with this could be to pre-generate likely candidates in a separate database, which would allow for quick serving of a set to work on.

ArthurPSmith (talkcontribs)

Some heuristics relating to name frequency might also help... Anyway, good suggestions, thanks!

ArthurPSmith (talkcontribs)
Magnus Manske (talkcontribs)

Nice! I tried it with myself, and even found a paper where I'm "just" string author. QS statement to add the P50 was generated correctly.

  • This needs an additional QS command to remove the string author
  • If the string author contains a reference (some do), that reference needs to be added to the P50 statement before removing the string author, otherwise people will start yelling at you (guess how I know that?)
  • You can open QuickStatements with all the commands pre-filled by doing a POST request (recommended, GET gets unpredictably chopped by the browser) like:

https://tools.wmflabs.org/quickstatements/api.php?action=import&temporary=1&openpage=1&data=YOURCOMMANDSHERE

That will save the user from copy/pasting the whole thing, and looks a lot less messy :-)

ArthurPSmith (talkcontribs)

Thanks! Yes, I was thinking about the references issue (and removing the string statements) ...

ArthurPSmith (talkcontribs)

I've updated it to do the QuickStatements post, and to add the references. Do you think we need to check for other qualifiers besides series ordinal on the original statement? I'll do a bit of testing and then add back in the delete statements...

ArthurPSmith (talkcontribs)

FYI this was rather popular at WikiCite - I had half a dozen or so people trying it out - thanks for getting it started and your suggestions so far! We fixed some bugs and I've done a bunch of refactoring; I have a few more ideas on the clustering angle I'm going to try out too.

ArthurPSmith (talkcontribs)

Hi Magnus - I've been continuing to update the "author disambiguator" and it's working well - I think close to 5000 cases (authors with multiple papers) have been processed with it so far (judging by the access logs).

I have however run into an issue for some papers with QuickStatements - specifically the ones with thousands of authors. Example is Q21481859 - just moving one "author name string" to an author" is VERY slow with QuickStatements - it only partially did the job after about 10 minutes, and it never added the "stated as" qualifier at all (you can see the most recent change on that item). I'm guessing it's hitting some sort of memory limit within QS or the API??

Reply to "Wikicite? and SourceMD?"

User:Magnus_Manske/duplicate_item.js

1
Addshore (talkcontribs)

Hi Magnus!

I just spotted this message in the logs.

"Blocked loading unprotected JS //www.wikidata.org/w/index.php?title=User:Magnus_Manske/duplicate_item.js for Jura1"

It looks like people won't be able to use that script in its current location & state. Should we protect it? or move it?

Reply to "User:Magnus_Manske/duplicate_item.js"

Structured Data - file captions coming this week (January 2019)

1
MediaWiki message delivery (talkcontribs)

My apologies if this is a duplicate message for you, it is being sent to multiple lists which you may be signed up for.

Hi all, following up on last month's announcement...

Multilingual file captions will be released this week, on either Wednesday, 9 January or Thursday, 10 January 2019. Captions are a feature to add short, translatable descriptions to files. Here's some links you might want to look follow before the release, if you haven't already:

  1. Read over the help page for using captions - I wrote the page on mediawiki.org because captions are available for any MediaWiki user, feel free to host/modify a copy of the page here on Commons.
  2. Test out using captions on Beta Commons.
  3. Leave feedback about the test on the captions test talk page, if you have anything you'd like to say prior to release.

Additionally, there will be an IRC office hour on Thursday, 10 January with the Structured Data team to talk about file captions, as well as anything else the community may be interested in. Date/time conversion, as well as a link to join, are on Meta.

Thanks for your time, I look forward to seeing those who can make it to the IRC office hour on Thursday. -- Keegan (WMF) (talk) 21:09, 7 January 2019 (UTC)
Reply to "Structured Data - file captions coming this week (January 2019)"
MediaWiki message delivery (talkcontribs)
The previous message from today says captions will be released in November in the text. January is the correct month. My apologies for the potential confusion. -- Keegan (WMF) (talk) 20:43, 7 January 2019 (UTC)
Reply to "Captions in January"

An enormous epidemiological Database

1
Linuxo (talkcontribs)

Hi,

First, I'm so in awe of your work on wikidata!

There's a database about the world epidemiology of diseases. This is quite a huge amount of data: Prevalence, Incidence, DALY, Mortality etc... for each country on the planet, for hundreds of diseases, classified by age distribution, sex ratio...

And it's under a Open Database licence (it's a work linked to WHO)!

https://vizhub.healthdata.org/gbd-compare/


Okay, so I thought it would be wonderful to get somehow access to this from wikidata ("What is the country in Africa where the prevalence of X disease is bigger among men than the prevalence of Y disease among women?" etc...)

They have no API on their site to access their data but you can download all their data in a massive dump in fact.

What do you think would be the best way to incorporate the data with Wikidata?

- If I dump it into wikidata it would be an enormous volume of data (hundreds of countries, hundreds of disease, multiplied by age range, multiplied by sex)

- if I link using Mix'n'match, the data won't be directly accessible: no API on their site to query from a QID.


Thanks a lot in advance!


Reply to "An enormous epidemiological Database"