Hi Magnus. This catalog is 100% unmatched. Do you know what is the problem? Thank you
User talk:Magnus Manske
Hi Magnus, I am implementing Listeria mostly on Serbian Wikipedia. I have noticed a possible error in Listeria. Namely, items are discarded the second time they should be presented in a list. You can see the SPARQL code is correct since wd:Q463896 is presented two times in the query (https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3Fimage%20%3FstartYear%20%3FendYear%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%7D%0A%20%20%3Fitem%20wdt%3AP39%20wd%3AQ11039843%20%3B%0A%20%20%20%20%20%20%20%20wdt%3AP31%20wd%3AQ5%20%3B%0A%20%20%20%20%20%20%20%20p%3AP39%20%3Fmandate_statement%20.%0A%20%20%0A%20%20%3Fmandate_statement%20ps%3AP39%20%3Fmandate%20.%0A%20%20%0A%20%20OPTIONAL%20%7B%20%3Fmandate_statement%20pq%3AP580%20%3FstartDate%20.%20%7D%20.%0A%20%20OPTIONAL%20%7B%20%3Fmandate_statement%20pq%3AP582%20%3FendDate%20.%20%7D%20.%0A%20%20OPTIONAL%20%7B%20%3Fitem%20wdt%3AP18%20%3Fimage%20.%20%7D%20.%0A%20%20%0A%20%20BIND%28str%28YEAR%28%3FstartDate%29%29%20as%20%3FstartYear%29%0A%20%20BIND%28str%28YEAR%28%3FendDate%29%29%20as%20%3FendYear%29%0A%20%20FILTER%20%28%3Fmandate%20%3D%20wd%3AQ11039843%29%0A%7D%20ORDER%20BY%20%3FstartDate).
However, it is not presented in the generated list from Listeria (https://sr.wikipedia.org/wiki/%D0%A1%D0%BF%D0%B8%D1%81%D0%B0%D0%BA_%D1%80%D0%B5%D0%BA%D1%82%D0%BE%D1%80%D0%B0_%D0%A3%D0%BD%D0%B8%D0%B2%D0%B5%D1%80%D0%B7%D0%B8%D1%82%D0%B5%D1%82%D0%B0_%D1%83_%D0%91%D0%B5%D0%BE%D0%B3%D1%80%D0%B0%D0%B4%D1%83). Could you give me some hint on how to fix this?
Hi Magnus! I still do think this is a Listeria bot issue. Everything around the SPARQL code seems in order. Could you help me with this?
Mix'n'Match for the norwegian national archive actors
Good afternoon! I have an excel file containing all archive-actors (creators) of the National archive of Norway with liks to their Archives. Can you have a look on that file and see if it is possible to make a Mix'n'Match for this?
PS itss an Excel file and i can send it to you If you want to look at it.
Hi, you can use https://tools.wmflabs.org/mix-n-match/import.php to import a tab-separated text/file (like you get from copy&pasting Excel) as a new catalog.
Thank you, but where should the Excel file be stored then? Can i just take the Excel file or should it be changed to a CSV file and paste it into the fiel below the row;
Text needs to be UTF-8. You can either paste the tab-delmited text into the box (for shorter catalogs), or upload it as as tab-delimited file (max. 20MB).
what does UTF-8 means
Maybe outside Mixn'Match but I have this error when using the Mix'n'Match importer. Do you have any suggestion?
Just point me to the Excel file then...
After a holiday I would kindly like to ask where I can store the Excel file you are asking me to point at? Breg
Oh come on, there are a million places where you can upload an Excel file, and point me to the URL. Or make a Google Spreadsheet instead.
Thank for Your patience, and pls disregard the e-mail.
Hello Magnus, I need help: I am using your tool in order to create some tables that are useful for controlling the properties on some items (see here). There's a problem: I just cannot get the qualifier for a given item. For example, as you can see, I am trying to extract the value P585 from the property P1352 for each item that is entity of "rugby union women's national team", but the column is blank and just don't know how to get it. Can you give me any hint please? thanks in advance, best regards.
Wikicite? and SourceMD?
Hi! Sorry, I won't be in SF. That said, I'm always happy to talk about new/improved tools, time permitting. "new_resolve_authors" is a crude solution to a real problem, but I'm not sure the codebase should be continued; rather, I'd like to see a different setup, based on better clustering. Also, ideally, a bot that, in the background, creates new high-confidence author items, and changes the statements in the publications accordingly. Having access to high-quality author databases other than ORCID would be a boon.
I don't think there's anything that compares to ORCID (VIAF and ISNI have some records for scholarly authors, but very incomplete from what I've seen). I expect there are internal publisher databases that have more useful data (like author email addresses) but I don't know how we get at those. The problem I'd like to address is handling authors from before ORCID existed, or who have never engaged with ORCID (and may now be deceased). I'd like to facilitate human curation, not sure if we're ready for bots to do this. My idea was related to just adding some simple features to help with deciding on what papers are associated with an author: I believe your tool right now looks at names of coauthors - which is actually a good start. I'd like to add in journal title, publication date, possibly citations and affiliations where we have any data on that.
The way I was thinking of proceeding was to clone the "new_resolve_authors" piece, edit it to just spit out a list of Quickstatements that can be run separately (rather than directly feeding it to the bot), and then try to recruit some people to test it and figure out ways to make it work better... Does that seem sensible? I guess I'll report back on how it goes...
Journal title etc. would work, especially for authors with "common" names, but it will miss out on some authorships (e.g. someone had a kitchen chat with you and puts you on their paper in some completely unrelated discipline/journal).
One reason I don't just start up QuickStatements myself is that after author item creation, the QID of that new item needs to be used as a value for the author statement changes. Theoretically that should work using the LAST keyword, but I'm not quite sure that works...
One could create the author item internally, and then run the rest in QuickStatements. Assuming there are no two identically named authors on that paper ;-)
Something that could improve "tasking" people with this could be to pre-generate likely candidates in a separate database, which would allow for quick serving of a set to work on.
Some heuristics relating to name frequency might also help... Anyway, good suggestions, thanks!
First stab at this up here: https://tools.wmflabs.org/author-disambiguator/ and https://github.com/arthurpsmith/author-disambiguator/ - not really expecting you to do anything on this, just to keep you informed! Thanks. I am aware some things are broken, but there's some basic functionality which works so... good!
Nice! I tried it with myself, and even found a paper where I'm "just" string author. QS statement to add the P50 was generated correctly.
- This needs an additional QS command to remove the string author
- If the string author contains a reference (some do), that reference needs to be added to the P50 statement before removing the string author, otherwise people will start yelling at you (guess how I know that?)
- You can open QuickStatements with all the commands pre-filled by doing a POST request (recommended, GET gets unpredictably chopped by the browser) like:
That will save the user from copy/pasting the whole thing, and looks a lot less messy :-)
Thanks! Yes, I was thinking about the references issue (and removing the string statements) ...
I've updated it to do the QuickStatements post, and to add the references. Do you think we need to check for other qualifiers besides series ordinal on the original statement? I'll do a bit of testing and then add back in the delete statements...
FYI this was rather popular at WikiCite - I had half a dozen or so people trying it out - thanks for getting it started and your suggestions so far! We fixed some bugs and I've done a bunch of refactoring; I have a few more ideas on the clustering angle I'm going to try out too.
Hi Magnus - I've been continuing to update the "author disambiguator" and it's working well - I think close to 5000 cases (authors with multiple papers) have been processed with it so far (judging by the access logs).
I have however run into an issue for some papers with QuickStatements - specifically the ones with thousands of authors. Example is Q21481859 - just moving one "author name string" to an author" is VERY slow with QuickStatements - it only partially did the job after about 10 minutes, and it never added the "stated as" qualifier at all (you can see the most recent change on that item). I'm guessing it's hitting some sort of memory limit within QS or the API??
I just spotted this message in the logs.
"Blocked loading unprotected JS //www.wikidata.org/w/index.php?title=User:Magnus_Manske/duplicate_item.js for Jura1"
It looks like people won't be able to use that script in its current location & state. Should we protect it? or move it?
Structured Data - file captions coming this week (January 2019)
My apologies if this is a duplicate message for you, it is being sent to multiple lists which you may be signed up for.
Hi all, following up on last month's announcement...
Multilingual file captions will be released this week, on either Wednesday, 9 January or Thursday, 10 January 2019. Captions are a feature to add short, translatable descriptions to files. Here's some links you might want to look follow before the release, if you haven't already:
- Read over the help page for using captions - I wrote the page on mediawiki.org because captions are available for any MediaWiki user, feel free to host/modify a copy of the page here on Commons.
- Test out using captions on Beta Commons.
- Leave feedback about the test on the captions test talk page, if you have anything you'd like to say prior to release.
Additionally, there will be an IRC office hour on Thursday, 10 January with the Structured Data team to talk about file captions, as well as anything else the community may be interested in. Date/time conversion, as well as a link to join, are on Meta.Thanks for your time, I look forward to seeing those who can make it to the IRC office hour on Thursday. -- Keegan (WMF) (talk) 21:09, 7 January 2019 (UTC)
Captions in January
An enormous epidemiological Database
First, I'm so in awe of your work on wikidata!
There's a database about the world epidemiology of diseases. This is quite a huge amount of data: Prevalence, Incidence, DALY, Mortality etc... for each country on the planet, for hundreds of diseases, classified by age distribution, sex ratio...
And it's under a Open Database licence (it's a work linked to WHO)!
Okay, so I thought it would be wonderful to get somehow access to this from wikidata ("What is the country in Africa where the prevalence of X disease is bigger among men than the prevalence of Y disease among women?" etc...)
They have no API on their site to access their data but you can download all their data in a massive dump in fact.
What do you think would be the best way to incorporate the data with Wikidata?
- If I dump it into wikidata it would be an enormous volume of data (hundreds of countries, hundreds of disease, multiplied by age range, multiplied by sex)
- if I link using Mix'n'match, the data won't be directly accessible: no API on their site to query from a QID.
Thanks a lot in advance!