User talk:Pyfisch

Jump to navigation Jump to search

About this board

Previous discussion was archived at User talk:Pyfisch/Archive 1 on 2019-02-24.

Jamy Oliver (talkcontribs)

Wer sind die Heroes von Mesolonghi?

Pyfisch (talkcontribs)

Bei den Helden von Mesolongi geht es um Revolutionäre des griechischen Unabhängigkeitskrieges, siehe auch die deutschsprachige Wikipedia: de:Jean Moréas. Habe die Beschreibung für Wikidata ein wenig gekürzt.

Reply to "Heroes von Mesolonghi"

Call for participation in the interview study with Wikidata editors

1
Kholoudsaa (talkcontribs)

Dear Pyfisch,

I hope you are doing good,

I am Kholoud, a researcher at King’s College London, and I work on a project as part of my PhD research that develops a personalized recommendation system to suggest Wikidata items for the editors based on their interests and preferences. I am collaborating on this project with Elena Simperl and Miaojing Shi.

I would love to talk with you to know about your current ways to choose the items you work on in Wikidata and understand the factors that might influence such a decision. Your cooperation will give us valuable insights into building a recommender system that can help improve your editing experience.  

Participation is completely voluntary. You have the option to withdraw at any time. Your data will be processed under the terms of UK data protection law (including the UK General Data Protection Regulation (UK GDPR) and the Data Protection Act 2018). The information and data that you provide will remain confidential; it will only be stored on the password-protected computer of the researchers. We will use the results anonymized to provide insights into the practices of the editors in item selection processes for editing and publish the results of the study to a research venue. If you decide to take part, we will ask you to sign a consent form, and you will be given a copy of this consent form to keep.

If you’re interested in participating and have 15-20 minutes to chat (I promise to keep the time!), please either contact me at [] or [] or use this form https://docs.google.com/forms/d/e/1FAIpQLSdmmFHaiB20nK14wrQJgfrA18PtmdagyeRib3xGtvzkdn3Lgw/viewform?usp=sf_link with your choice of the times that work for you.

I’ll follow up with you to figure out what method is the best way for us to connect.

Please contact me if you have any questions or require more information about this project.

Thank you for considering taking part in this research.

Regards

Reply to "Call for participation in the interview study with Wikidata editors"
Gamaliel (talkcontribs)

I'm very curious about this person. How do we know this is a hoax? Where did the hoax come from? The Philadelphia Inquirer article cited by the Dictionary of Art Historians doesn't seem to exist, I can't find it in newspapers.com

Pyfisch (talkcontribs)

I've posted an explanation at the talk page why this is a hoax. I hope it will be useful for future reference.

Gamaliel (talkcontribs)

Excellent explanation, thank you for posting that.

Reply to "Janie Smiles Janie Smiles (Q19997543)"
Harpuny (talkcontribs)

Hello

Loki has given birth. He is a genderfluid.

Infovarius (talkcontribs)

Formally, it was not him, it was his tranformation form (mare).

Pyfisch (talkcontribs)

Thanks for providing a link for this claim. I don't think transformations of gods usually determine what they are. For example instance of (P31) bull (Q693690) for Zeus (Q34201) would be incorrect in my opinion. In addition Loki is never described as genderfluid in mythology. (First because the term didn't exist but I also doubt that the Norse people thought of Loki in this way.)

Reply to "Loki"

FischBot: Gelöschte Höhenangaben

12
Haeferl (talkcontribs)

Hallo Pyfisch! Dein Bot löscht Daten aus Objekten, die ich korrigiert hatte, z.B. der Weiße Knopf Q21862790, bei dem die Ceb-Wikipedia lediglich noch bei der Geo-Names-Kennung als Beleg eingetragen ist. Wenn der Bot sämtliche Daten löscht, egal woher sie stammen, nur weil bei einem Punkt die Ceb-Wikipedia noch drinsteht, finde ich das keine gute Idee. Es war tagelange Arbeit, die Daten alle richtigzustellen. Liebe Grüße, --~~~~

Pyfisch (talkcontribs)

Hallo Haeferl! Im speziellen Fall war die Cebuano-Wikipedia ausdrücklich als Beleg der Höhe über dem Meeresspiegel angegeben. Deswegen kann ich deine Beschwerde hier nicht nachvollziehen.

Allerdings umfasst die Aufgabe auch die Entfernung aller Höhenangaben ohne Belege in Datenobjekten, die einen Sitelink zur Cebuano-Wikipedia haben. Das hat den Grund, dass viele der Höhenangaben ohne Quelle und falsch aus diesem Wiki übernommen wurden. Diesen Teil der Aufgabe habe ich aber noch nicht begonnen.

Können wir die von dir korrigierten Höhen irgendwie eingrenzen? Hast du z.B. nur Berge aus einem Land bearbeitet? Dann kann ich diese Datenobjekte zuletzt bearbeiten, während du in der Zwischenzeit die Quelle für die Höhenangabe nachträgst?

Viele Grüße

Haeferl (talkcontribs)

Ja, das waren alle Berge in Osttirol und ein paar in Niederösterreich. Bin aber derzeit noch in der Vorjury von WikiDaheim und werde erst nächste Woche Zeit dafür haben. Liebe Grüße, --~~~~ P.S.: Warum funktioniert denn da das Unterschreiben nicht?

Pyfisch (talkcontribs)

Alles klar! Ich bearbeite keine Berge in Österreich bis du mir Bescheid gibst, dass alle Höhenangaben, die drin bleiben sollen eine Quelle haben.

PS: Diese Diskussionsseite benutzt "Structured Discussions", heißt dein Name steht oberhalb des Beitrags und das Datum darunter. Somit wäre eine Unterschrift doppelt und wird nicht gebraucht.

Haeferl (talkcontribs)

Danke! Wobei ich eins noch bemerken will: Das Korrigieren der Koordinaten ist leichter, wenn die falschen noch drin sind. Weil so weit daneben waren kaum welche, dadurch hab ich die Gipfel in der Karte meistens recht schnell gefunden. Wenn die falschen weg sind, ist das dann viel mühsamer ... Danke auch für die Erklärung mit der Unterschrift, liebe Grüße!

Pyfisch (talkcontribs)

Die geographischen Koordinaten bleiben drin, lediglich die Höhe wird vom Bot gelöscht.

Pyfisch (talkcontribs)

Mir ist ein Missgeschick passiert. Ich habe zwar korrekt geprüft, ob ein Berg in Österreich liegt, aber die Höhenangaben ohne Quelle und aus der cebuanosprachigen Wikipedia teilweise trotzdem entfernt. Falls davon einige "deiner" Berge betroffen sind: in der Versionsgeschichte lässt sich die Änderung problemlos rückgängig machen.

Pyfisch (talkcontribs)

Hallo Haeferl, ich habe die Höhenangaben aus der Cebuano Wikipedia für die Berge und Hügel aus allen anderen Ländern gelöscht. Wie weit bist du mit den Bergen in Österreich, dass ich die Aufgabe abschließen kann?

Haeferl (talkcontribs)

Hallo Pyfisch, 'tschuldige bitte, dass ich mich erst jetzt wieder melde. Ich mach das diese Woche. Brauchte nach der Vorjury etwas anderes als Akkord-Mausklicken und musste deshalb erst einmal einen Artikel schreiben. Liebe Grüße, Häferl

Haeferl (talkcontribs)

Hallo Pyfisch, ich hoffe, Dir wird jetzt nicht wegen mir langweilig. ;-) Möchte Dich kurz um Hilfe bitten: Bei Q21873086 ist ein Vertipper im Titel, es sollte Winkeltalbach heißen. Kann hier leider nicht selbst verschieben. Danke und liebe Grüße, --~~~~

Haeferl (talkcontribs)

Und die Bitte hat sich auch schon erledigt, muss man ja gar nicht verschieben, sondern nur den deutschen Titel ausbessern, das hatte ich vergessen. Liebe Grüße, --~~~~

Haeferl (talkcontribs)

Hallo Pyfisch, ich bin soweit durch, kannst also Deinen Bot weiterlaufen lassen. Und danke fürs Warten! Liebe Grüße!

Reply to "FischBot: Gelöschte Höhenangaben"
Sangoura (talkcontribs)

Thank you for suggesting a better way of formulating descriptions. Could you please specify to which edits you are referring to? I can then update them

Pyfisch (talkcontribs)
Reply to "October 2020"
Mrflip (talkcontribs)

Hiya, thanks for the style correction to my edits on osmina. Not sure if it was you or a bot, but it would be even more helpful to show what the before/after was: eg "I corrected

 An old Russian dry measure, approximately 105 litres.

To

 old Russian dry measure, approximately 105 litres

I've made the edit in the page for you, and appreciate the contribution!"

I spent some time trying to figure out what was wrong with the current state of the page before realizing you had edited the change in already; once I saw the history page all was clear.

Thanks for your work!

Pyfisch (talkcontribs)

Hi, thanks for writing the description in the first place! I'll think about how make the talk page message clearer, so it is obvious that the description is already updated and it's easy to see what was changed.

Mrflip (talkcontribs)

Thought about it more — linking to the history page would do the trick if that's straightforward.

Reply to "Description policy"

You caused 30 million constraint violations

6
Multichill (talkcontribs)

With this edit. Please think about Commons in the future when doing updates like this.

Pyfisch (talkcontribs)

Sorry about that. I completely tuned out Commons when I made this edit. I will be a lot more cautious when restricting the allowed entity types.

Multichill (talkcontribs)
Pyfisch (talkcontribs)
This post was hidden by Pyfisch (history)
This post was hidden by Pyfisch (history)
Reply to "You caused 30 million constraint violations"
MisterSynergy (talkcontribs)

Hey Pyfisch, in User:Pyfisch/Counter-Vandalism you refer to the "magic summary" containing valuable information about the edit which Wikibase adds to (almost) all ns0 edits. It is clear what you mean with this term (the part between /* and */), but how do you split it up into action and parameter(s) such as language codes or wiki projects? Did you find some documentation for this feature somewhere, or did you just try to gather parameters heuristically?

Cescolino89 (talkcontribs)

Hola, no se por que me has corregido esos datos, toda persona puede incurrir en robos y lo de ese ser humano ha sido uno de los más grandes del arbitraje español. Ruego que modifique su corrección

Pyfisch (talkcontribs)

Hallo MisterSynergy, beschreibt die Magic Summaries in Wikidata. Ich habe einen regulären Ausdruck ìn Python benutzt um die Zusammenfassungen zu parsen: "/\* ([\w-]+):(?:(\d+)(?:\|([\w-]+))?)?.*? \*/.*".

Hola Cescolino89, este es un resumen de sus modificaciones: Has entrado que Juan Martínez Munuera es un ladrón y no un árbitro de fútbol, que no es un hombre sino un ladrón, y que es una película y no un hombre.

MisterSynergy (talkcontribs)

Thanks, this is what I looked for, although it does not seem to be as complete as desired unfortunately.

MisterSynergy (talkcontribs)

FYI: I am using https://public.paws.wmcloud.org/User:MisterSynergy/misc/2020%2010%20unpatrolled%20changes/unpatrolled%20changes%20dashboard.ipynb to filter unpatrolled recent changes currently, in order to get an idea of the situation and to see what I can patrol. It is originally meant to be for myself only and not for presentation to others, which is why I did not present it in the 24hr-meeting. The Python code is a bit lengthy meanwhile, but not much else than some SQL querying (~2-5 min), pandas acrobatics, and graph decoration (both on the fly).

Yet, it helps me to find larger sets of items which can be batch-patrolled so that these revisions do not consume the time of other patrollers. We have currently patrolled roughly 40% of all unpatrolled changes, and I find it still difficult to filter the really problematic ones out of the remaining 100k unpatrolled edits. Any idea where to look at, or how to filter? I have not figured out how to filter unpatrolled changes so that the actual vandalism shows up in reasonable numbers. I see some occasional instances of vandalism, but it really isn't much.

I also think that we can offer plenty of more reports for other users to engage in this field. Whether they would do so, however, … I'm not sure. :-)

Pyfisch (talkcontribs)

Very interesting evaluation. I'm glad it helps you to batch patrol some changes, so others don't have to.

I have some ideas for how to filter the remaining changes, but they are usually a bit more complicated:

  1. Look for changed statements where the value was changed, but the references weren't. This is almost never correct and should be fixed. Sometimes this is done by vandals or in test edits.
  2. There are certain facts that never change and therefore the statements shouldn't be changed either. For people these are for example birth name, date of birth, place of birth.
  3. In theory constraint violation reports or ShEx could be used to check and filter changes. But due to the huge number of inconsistencies found in Wikidata there are already too many problems to fix for human editors.
  4. One strategy to prioritize unpatrolled changes could be to count the number of sitelinks to Wikipedia. Vandalism occurs most often on items for important topics or people which will have Wikipedia articles, in addition these items are on average more complete meaning new editors will have less information to add.
  5. Filter changes that were made by globally locked users or users that are blocked on other Wikimedia projects. I sometimes find users that are already blocked as sockpuppets on English Wikipedia, but their bad edits remain on Wikidata.

I think the data quality issues and vandalism on Wikidata are linked. As long as Wikidata tolerates statements without sources it will be easier to add wrong information than to remove it. This especially applies to contentious claims, but also to facts like elevation of mountains where there is always some source. When no source is given for a new or changed claim, I usually either try to verify it, which takes me much longer than to add the claim, or I won't bother patrolling the change.

MisterSynergy (talkcontribs)

Thanks for the comment. Let me try to address all of your points:

  1. No clue how to query this right now. WMDE is working on a feature that somehow records these cases (value changed, but reference not) and to warn users in case this happens unintentionally, but I don't know where they store this data and how to query it. Users can confirm that they did this intentionally as well.
  2. There can always be mistakes that are being corrected by IPs. In fact, I have seen quite a lot of valuable contributions by non-autoconfirmed users in the past weeks and think that the vast majority of unpatrolled changes was made with good faith and relatively good knowledge about this project.
  3. ShEx is useful to compare an item to a desired data model (i.e. which properties to use, and so on), but IMO it is not helpful to detect vandalism. Additionally, it did not really gain traction yet and it does not perform well on a project scale as you compare single items against the schema. Covi reports on the other hand, at least to ones by KrBot, suffer from KrBot's somewhat outdated and incomplete implementation that does not always comply with the official constraint definitions. I'd rather see them go away completely or being done by someone else, but KrBot's operator as the inventor of the covi system is still on it.
  4. Have a look at User:MisterSynergy/patrol/highly used items. Those are items which are in use in more than 500 Wikimedia pages in some way and have unpatrolled changes. This problem will be gone soon, since I got my admin-bot User:MsynABot approved and it will soon implement the RfC that we are going to semi-protect all items with 500 or more uses on Wikimedia pages. It just needs some minor modifications to the code to be done.
  5. This is an aspect which I have not yet considered in my script. We have discussed those API calls earlier and I found the database where a part of the information is coming from, but unfortunately some data such as local edit counts and local blocks are apparently directly being queried from all the servers on-the-fly. In other words: I don't think we can retrieve all information in the API call result for several users in a single query. We can just see when the global account was created from the central users database.

Generally I am not overly convinced by the narrative that Wikidata has a serious vandalism problem. Yes there is vandalism and some of it stays much longer than acceptable, but this is not a problem of a scale that should objectively threaten Wikidata's reputation. I know that in some of the larger Wikipedia projects (enwiki, dewiki, etc) there are influential editors of the core community propagating this notion, but they usually only show a few instances of Wikidata vandalism and then claim that this is a serious problem—without much knowledge about the actual situation. If we considerably increase our counter-vandalism activities, they would either not believe it to be effective, or find some other reason why they do not like (and want) Wikidata.

Pyfisch (talkcontribs)
  1. It is not possible to query right now. You'd have to get the old and the new revision via the API and compare them. The warning that is shown to users is temporary right now and not stored anywhere. However there is a Phabricator ticket to change this.
  2. Certainly. But you asked for ways to filter changes that are likely to be vandalism and these changes are comparatively likely to be vandalism. It still needs to be verified if they actually are.
  3. An additional challenge is that the constraints of properties are sometimes incomplete or need improvement. However commonly vandals set values that have the wrong type. For example a vandal replaced the first name of a person with shit.
  4. There are many items on Wikidata that don't have sitelinks at all and I don't think they are targeted by vandals that often. However the item for an author of an unpopular book that is read in schools may be targeted frequently.
  5. Thats a pity. I don't know if this data is important enough to query from the API regularly.

It's true that vandalism problem is exaggerated by these Wikidata critics. Right now I use a filtered display of recent changes: high risk changes (remove the unpatrolled filter to see more). I patrol these changes regularly, the majority are either vandalism or test edits and need to be reverted. The issue is that I have the impression that I am the only person who really patrols these changes with these tags. Don't misunderstand me, a lot of vandalism in these categories is reverted by other people before I have a change, but right now I seem to be the only person specifically patrolling changes with these tags. Before I started this list was a lot longer and contained changes that weren't patrolled for 10, 20, 30 days. And I don't think this vandalism is usually caught in Wikidata if there isn't a motivated person at the time.

MisterSynergy (talkcontribs)

Thanks, quite a helpful filter. I think I could replicate most of it in my script as well. Might be an idea to search for activity from similar IPs then.

As of now, I am trying to patrol on a per-user approach, i.e. judge whether a user edits with good faith or not, and then either patrol everything or undo whatever is vandalism. This is a problem that I see with most of the current patrol tools that take a per-revision approach, where it is often difficult to judge an edit due to the lack of context; the per-revision approach is also relatively expensive in terms of required patrol effort.

My intention is that we might eventually get to a point where (with some more editors) all edits, maybe with select characteristics, receive a patrol process by an experienced editor. "Terms in German" are for instance very doable, and there is no real backlog in spite of German being one of the most used languages here. However I am not sure whether we can get there on a significantly larger scale, to be honest.

We also need to consider that after 30 days all edits are practically being patrolled anyways, since this information resides in the recentchanges table that is just a rolling snapshot of the activity in the past 30 days. I'd rather patrol good-faith, yet not perfect edits than let them obscure our sight to the actually problematic edits.

Reply to ""magic summary""
Malikxan (talkcontribs)

Hi, Pyfisch! Thanks for Wikidata:Requests_for_deletions#Q99205011. Pages like this were created by an anonymous user. The Planetikio Movie article was created on the Uzbek Wikipedia (as far as I know, this article was created on many wikis). I checked this article for other wikis, found one, and then added a link. I am the administrator of the Uzbek Wikipedia, this non-encyclopedic article has been allready deleted. Cheers! Malikxan (talk) 07:09, 18 October 2020 (UTC)

Pyfisch (talkcontribs)
Malikxan (talkcontribs)

Administrators on other wikis need to be active. :) Malikxan (talk) 08:32, 18 October 2020 (UTC)