User talk:Wolfgang8741

From Wikidata
Jump to navigation Jump to search

Duplicate catalogs[edit]

Hello Jonathan, I noticed that some of the catalogues you uploaded to mix'n'match are duplicates of others which you had uploaded earlier:

Perhaps @Magnus Manske: can help you merge these catalogues (or just delete one of them). Mahir256 (talk) 04:53, 16 December 2018 (UTC)[reply]

Thanks for the note, I am aware of my duplication (after the fact of hitting submit), doh! A limitation to Mix-n-match as uploaders cannot truly fix their mistakes even marking not active won't hide the old. You beat me to pinging Magnus on those. Thanks for the note, and it is nice to know people actually are using Mix-n-match (or at least watching). I can't do all these on my own. My technique matured in OpenRefine so that I need to fix some older uploads as they are not complete and excluded some territories etc. I know of a few other corrections to descriptions that are needed ie outlying " , " and Louisiana has Parishes not Counties... well learning is fun, but it doesn't harm the matching. Thanks again. Wolfgang8741 (talk) 05:10, 16 December 2018 (UTC)[reply]

GNIS Matches[edit]

Hi Jonathan, thanks for contacting me about helping with the GNIS catalogs on mix & match. I had fun working on them (love mix & match) and the best thing is, someone finds the effort useful! I'll continue to work on them. Hope you will come to one of the upcoming Wikimedia DC events in the New Year. Best wishes, Uncommon fritillary (talk) 02:56, 17 December 2018 (UTC)[reply]

New page for catalogues[edit]

Hi, I created a new page for collecting sites that could be added to Mix'n'match and I plan to expand it with the ones that already have scrapers by category. Feel free to expand, use for property creation. Best, Adam Harangozó (talk) 10:45, 6 November 2019 (UTC)[reply]

Thank you for participating in the FindingGLAMs Challenge![edit]

FindingGLAMs medal.svg Thank you for participating in the FindingGLAMs Challenge!
By improving information about GLAM institutions on Wikidata, you made the Wikimedia projects better for everyone!

Alicia Fagerving (WMSE) (talk) 14:19, 16 March 2020 (UTC)[reply]

New OpenRefine reconciliation service[edit]

Hi!

Thank you for wearing the {{User loves OpenRefine}} userbox on your user page!

Because the existing Wikidata reconciliation service has had severe performance issues recently, I have created a new one which should be faster and more robust. You can add it to OpenRefine in the reconciliation dialog with the following URL: https://wikidata.reconci.link/en/api (or by replacing en by any other language code).

If you have any issues with this new service, let me know.

Happy reconciling! − Pintoch (talk)

US State Legislators[edit]

Thanks for all your work on adding position held (P39) statements for the state legislators — this is really great and super-useful. I've been considering adding some of this myself, but if you're in the middle of an upload spree, I don't want to step on your toes with any of it. Are you hoping to get to some degree of completeness with any of it at the minute, or is that something that's likely to be more of a longer-term hope? --Oravrattas (talk) 08:32, 19 January 2022 (UTC)[reply]

Thanks, I've started this manual import of the current U.S. State Senators for their current position and the dates of terms from their Wikipage Infobox to get more familiarity with the data and data errors in the office holders Infoboxes before considering tackling any automated import or tool for continuous verification across Wikidata and English Wikipedia. A incremental degrees of completeness is a longer term goal, but my current scope of work is aimed at sync between Wikipedia and Wikidata to highlight the data gaps where both communities could work together to tackle completeness.
If tackling the current representatives won't interfere with my current scope and may be a worthy comparison of technique.
My objectives so far are:
  1. Become familiar with the English Wikipedia infobox data and embedded errors as well as inconsistencies and missing Q's in Wikidata for the current position holders and data sync gaps.
  2. Import all current U.S. State Senate position holders position held and start date of current session and replaces/replaced in sync with Wikidata from English Wikipedia Infobox and
  3. Flag missing Infoboxes.
  4. Identify where constraints and validation would improve Infobox parsing for officeholders while identifying pages and infoboxes needing cleanup in Wikipedia
  5. Flag missing WikiProject working group claim for officeholders on English Wikipedia.
  6. Identify which external Identifiers are missing from mix-n-match for politicians and would be useful to compare for coverage and details ie Vote Smart ID (P3344) and Ballotpedia ID (P2390)
My consideration here is current office holders are a means to anchor from and work back in time to fill out detail and work out issues with the data model over time. For example electoral districts are currently applying districts as the named value while not accounting for district reshaping (ie redistricting) and including the years the district shape existed. The importance of this would depend on what detail that Wikimedia and wikidata users need to query. For now, Linking to a single numeric district entity makes sense as that is what level exists in Wikipedia with future detail and differentiation being added to the model later when the need to increase resolution is warranted.
I don't have an immediate timeline on this work, but more a direction of where it may go. I've mostly been focused on Geographic items in Wikidata than political office holders, but thought Wikiproject every politician would be a fun look into U.S. Civics. Happy to hear any feedback or work on coordinating on the data ingestion and sync. Wolfgang8741 (talk) 16:22, 20 January 2022 (UTC)[reply]
@Wolfgang8741: This is all really great! Are you pulling the data out the infoboxes manually, or are there good tools for extracting/importing that information? One thing I'd like to do is set up some comparisons against an external source to help see where there are gaps. Unfortunately Open States only usually has useful identifiers for cross-matching when a state legislator has also been in Congress, so using it for bulk upload would require a lot of reconciliation first, but I suspect that simply comparing their data with what's already in Wikidata could be useful purely as a reporting tool / sanity check. It'll probably be a few weeks before I get a chance to look at any of that, but do you have a sense of what might be a good state to test with first? --Oravrattas (talk) 06:35, 25 January 2022 (UTC)[reply]
Yes, manually at this moment. The Harvest Templates function of Pywikibot might make automating a step forward, but based on what I'm seeing would want to put some sanity checks and data checks in place between extracting from the templates and inserting into Wikidata through any automated means.
  • I took a look at OpenStates (this group is new to me, but I remember the original work of the Sunlight Foundation), which is cool and CC0 meaning they're compatible for import, but currently their discussion and issues have zero mention of Wikidata. A step forward might be to propose OpenStates ID to Wikidata and reaching out to OpenStates to add a Wikidata Q field within their records to allow for cross linking and sanity checks with less need to resolve entities again. To help with the heavy lift syncing and resolving identities with https://mix-n-match.toolforge.org/ as a collaborative space might be a good tool and allows monitoring. It looks that OpenStates are using the format ocd-person/9b425a88-36ae-439c-b2f9-8da167b9ff27 for unique IDs for people. Have you worked with them or reached out to them?
  • Before I started New Jersey had been worked on and is what I modeled some of the structure on, but I don't know how functionally complete that data is currently compared to the other states I've started. If you would like to collaborate on a state I'd be happy to focus my effort on one. I'd personally be more interested in Michigan, Ohio, or West Virginia. Wolfgang8741 (talk) 16:39, 28 January 2022 (UTC)[reply]
I reached out on the OpenStates slack about crosslinking Wikidata and OpenStates and proposing an ID. They seem open to crosslinking if Wikidata IDs could be provided. I'm going to work on a Mix-n-Match catalog to assist with resolving one state and propose an OpenState ID property to link back. This would help with your sanity check and give an idea of what work this is to bring the workflows together. Both being CC0 is useful too. Wolfgang8741 (talk) 17:33, 28 January 2022 (UTC)[reply]
@OravrattasI've created a Mix-n-Match catalog for OpenStates people and will add the OpenStates property once the proposed ID is settled and approved. Wolfgang8741 (talk) 18:37, 10 February 2022 (UTC)[reply]
@Wolfgang8741: that's great! I've done some matching, but Mix-n-Match is a bit too unbearably slow for me at the minute, so I'll try coming back to it again tomorrow and hopefully it'll be a bit faster then. --Oravrattas (talk) 20:21, 10 February 2022 (UTC)[reply]
@OravrattasI've completed matches for 7259 ids with the remaining 737 do not have an EN Wiki (maybe a few have been created) and may have Wikidata IDs, but without the property proposal approval to add IDs there is not much motivation to dig into the additional data. I found some issues with some names so a blind import may not be the best, but definitely let me know if you do any import or sanity checks with the OpenStates resolved data. I also proposed the Michigan Legislative Bio ID which may aid in validation of Michigan politicians. Wolfgang8741 (talk) 06:01, 23 February 2022 (UTC)[reply]
Wow, that's phenomenal work! I've tinkered a little bit around the edges here and there, as I've come across relevant things, but it's likely to be at least another few weeks before I get the chance to go deep on any of it.
NB: I didn't get a notification of the Michigan Legislative Bio ID proposal, so it's possible that the 'ping project' didn't work. (It also looks like you might have doubled up the references to the Senate a couple of times: I'm presuming this was meant to be the House and Senate?) Oravrattas (talk) 06:13, 23 February 2022 (UTC)[reply]
Your contributions to the matching was appreciated, definitely didn't do it all on my own. Thanks for the heads up in the proposal, fixed the double ref, you were correct. Wolfgang8741 (talk) 06:25, 23 February 2022 (UTC)[reply]

Call for participation in a task-based online experiment[edit]

Dear Wolfgang8741,

I hope you are doing good,

I am Kholoud, a researcher at King's College London, and I work on a project as part of my PhD research, in which I have developed a personalised recommender system that suggests Wikidata items for the editors based on their past edits. I am collaborating on this project with Elena Simperl and Miaojing Shi.

I am inviting you to a task-based study that will ask you to provide your judgments about the relevance of the items suggested by our system based on your previous edits.

Participation is completely voluntary, and your cooperation will enable us to evaluate the accuracy of the recommender system in suggesting relevant items to you. We will analyse the results anonymised, and they will be published to a research venue.

The study will start in late January 2022 or early February 2022, and it should take no more than 30 minutes.

If you agree to participate in this study, please either contact me at kholoud.alghamdi@kcl.ac.uk or use this form https://docs.google.com/forms/d/e/1FAIpQLSees9WzFXR0Vl3mHLkZCaByeFHRrBy51kBca53euq9nt3XWog/viewform?usp=sf_link

I will contact you with the link to start the study.

For more information about the study, please read this post: https://www.wikidata.org/wiki/User:Kholoudsaa

In case you have further questions or require more information, don't hesitate to contact me through my mentioned email.

Thank you for considering taking part in this research.

Regards

Kholoudsaa (talk) 18:32, 28 January 2022 (UTC)[reply]