Wikidata talk:WikiProject Counter-Vandalism

From Wikidata
Jump to navigation Jump to search

Great project1[edit]

Hi all, I think this is a great project, as Wikidata is a very important centre for all the Mediawiki-sites. At the dutch wikipedia we have a similair thing which makes use of RTRC, a script from m:User:Krinkle, see for a manual this page. At the dutch wp we also make a list based on this RTRC-tool which looks like this: [1]. Only problem: it is in dutch, so I hope you can translate it via Google Translate or something else. Maybe it would be a great idea to implement that here on wikidata too. I'd like to hear suggestions! Q.Zanden questions? 23:51, 19 May 2017 (UTC)[reply]

Thanks for this tip. I added a link to RTRC. --Pasleim (talk) 08:56, 20 May 2017 (UTC)[reply]
Very interesting tool, but right now it somehow lacks Wikidata-specific filter capabilities. If @Krinkle is willing to adapt it to the Wikidata RC workflow, we can provide some input here. —MisterSynergy (talk) 09:22, 20 May 2017 (UTC)[reply]
@MisterSynergy:, what would you like to add to the RTRC-tool? I opened an issue at github here to ask him to talk with us about some features we would like to add to RTRC. For me, the tool works fine! Q.Zanden questions? 18:46, 20 May 2017 (UTC)[reply]
Efficient RC work requires filtering, and this tool does not support many useful filter options for Wikidata, if I see it correctly. For unstructured Wikipedia edits it can distinguish between page creations and page changes. Wikidata has structured data and permits much finer filter options for changed items; some of them are even essential for effective RC work, as can be seen at the already existing tools (reCh, DiffLists.js, …). However, since a variety of similar tools is a desirable situation (competition in development, fits most needs), I’d be happy to see the following filter functions integrated:
  • Filter edits by type: changes in claims, terms (label, desc, alias), sitelinks, merging, other; (as in reCh, DiffLists.js)
  • Filter term edits further by language; Wikidata is inherently a multi-language project, most editors only understand a very small subset of them. No need to see all the changes in other languages; (as in reCh, DiffLists.js)
  • Filter claim edits further by property or properties; (as in DiffLists.js)
  • Filter sitelink edits further by wiki project; (as in DiffLists.js)
Additionally, “batch patrol” (as in reCh) is a process I used a lot in the past. There are sometimes anon users with a large amount of edits which easily clog the RC stream if you can’t effectively patrol their edits. —MisterSynergy (talk) 19:07, 20 May 2017 (UTC)[reply]

Bot patrolling[edit]

Since a few months, DeltaBot is marking undone edits as patrolled, i.e. edits which don't contribute anymore to the current state of the page. I could extend this functionality to other kinds of edits:

  • users are undoing themselves by adding/removing the content they have removed/added few seconds before.
  • page moves on the client site
  • page deletions on the client site

What do you think, could it be too invasive? --Pasleim (talk) 09:54, 22 May 2017 (UTC)[reply]

Can we have those cases listed for a couple of weeks to get an idea about the impact (number of cases, fraction of cases where bot-patrol would be problematic for whatever case…)? —MisterSynergy (talk) 10:46, 22 May 2017 (UTC)[reply]
I strongly support the proposal by @Pasleim: and I also agree with @MisterSynergy:: a list for a couple of weeks would be very interesting. --Epìdosis 06:36, 25 May 2017 (UTC)[reply]
Right now DeltaBot patrols only one edit if a sequence of edits is undone: [2], all revisions marked reverted should be patrolled. Users often edit their own descriptions multiple times (point 1), it would be nice if you only needed to patrol their last change. Maybe try to generally patrol all edits that don't affect the current page state? --Pyfisch (talk) 22:32, 8 October 2020 (UTC)[reply]
Thanks to upstream changes, the bot can now use change tags. --Matěj Suchánek (talk) 07:32, 9 October 2020 (UTC)[reply]
If changes were undone using "restore" DeltaBot patrols all revision that happened after the restored revision. :-) --Pyfisch (talk) 09:39, 9 October 2020 (UTC)[reply]

Putting a link to this project under Special:RecentChanges[edit]

I think it would be useful to put a link to this project under the utilities-list at https://www.wikidata.org/wiki/Special:RecentChanges . That way users who go to the "Recent change"-page to do counter-vandalism can discover this project and join it. ChristianKl (talk) 21:40, 24 May 2017 (UTC)[reply]

We need to inform the community about this project anyway. It was already announced in the most recent weekly newsletter, but I am not sure how many readers found it there. I consider to put an “advertisement” to WD:PC. There are a couple of other pages which could also link to this project permanently. —MisterSynergy (talk) 05:18, 25 May 2017 (UTC)[reply]
After thinking a bit, how about sending a thank you notification if a user does patrol 100/1000 changes that refers to this project? It could also be about undoing a certain amount of edits by other people. ChristianKl (talk) 09:21, 26 May 2017 (UTC)[reply]
I am highly sceptical about that kind of rewards. I do not think that it attracts the kind of RC workers we are seeking. The pure number of actions also depends on the tools one uses, but it does not really stand for the amount of work an RC worker has done.
It is important that we make RC patrolling as efficient as possible with great tools. We need to advertise them, and to improve them via with feedback to the developers. To my opinion this is one of the main reasons for this WikiProject. —MisterSynergy (talk) 09:31, 26 May 2017 (UTC)[reply]
Okay, if you primarily want established editors in the project I see that alerting newer users might not be the best strategy. I'm personally thinking a lot about how to get new editors who edit a bit exposed to more task within Wikidata. ChristianKl (talk) 12:35, 27 May 2017 (UTC)[reply]

description editing in the Wikipedia Android app[edit]

FYI Wikidata:Project chat#Wikidata description editing in the Wikipedia Android app --Pasleim (talk) 15:32, 5 September 2017 (UTC)[reply]

Yet another RC tool[edit]

Screenshot of the ORC tool

The last few months, I have been working on and doing counter vandalism work here at Wikidata with an own JavaScript patrolling tool - ORC (Open-ended Recent Changes tool). It's not limited to Wikidata, but has at least some special features for it. While it does not have the filtering capabilities which were requested for RTRC just a few sections above this thread, it will help you to identify what the item that has been changed is about by displaying the label and description (and aliases, and their languages, if not yours), the sitelinks (incl. preview popup), and the most important statements at a glance. Otherwise, it's focussed on patrolling, so it easily allows patrolling individual edits as well as all from a user or on a page. If you don't want to patrol the edits of a particular user, but also don't want to get bothered by their edits, you can ignore them. And so on. A more comprehensive list of features is to be found on the page I linked. Or just try it out. I'd love to hear your feedback, especially as development of this tool is still very much in progress. It's not perfect for sure, but it helps me a lot with my RC work already, and I am eager to hear your input how it could be improved. --YMS (talk) 20:32, 25 December 2017 (UTC)[reply]

@YMS: This sounds amazing, thanks! However, I had trouble getting it to work at first, I guess I'm not too familiar with how javascript imports should work, but what I ended up doing was adding:
mw.loader.load('//meta.wikimedia.org/w/index.php?title=User:YMS/orc.js&action=raw&ctype=text/javascript');
to my global.js on meta.wikimedia.org - Chrome was complaining about the content type being text/html with the load command you gave on the ORC page. ArthurPSmith (talk) 14:35, 26 December 2017 (UTC)[reply]
Hmm, not sure about this message. The loader call should be correct. Does the "ORC" tab appear on Special:RecentChanges (as it's a large script, loading sometimes may time out; so if it doesn't, best try reloading the page)? --YMS (talk) 15:25, 26 December 2017 (UTC)[reply]
Yes, the ORC tab appears now. ArthurPSmith (talk) 16:01, 26 December 2017 (UTC)[reply]
Oh, but it did not appear when I used the command you listed - I opened the Chrome Javascript Console to see what was going on and saw a message that it had not run the orc.js script due to text/html content type. So then I switched to the above which I believe forces a javascript type. ArthurPSmith (talk) 17:28, 26 December 2017 (UTC)[reply]
Actually, in my case it does not appear (I added the string to commons.js [3]).--Ymblanter (talk) 21:14, 30 December 2017 (UTC)[reply]
mw.loader.load( '//meta.wikimedia.org/w/index.php?title=User:YMS/orc.js&action=raw&ctype=text/javascript', 'text/javascript' ); // [[User:YMS/orc.js]] works in my case.
Anyway, nice tool—but I already miss sophisticated filter options. The “ORC” tab appears in some unexpected contexts, such as user javascript pages. Is this intentional? There are also some RC filters which the tool doesn’t understand (at least “ORC ignores unknown parameter: damaging__likelybad_color (c4)” and “ORC ignores unknown parameter: damaging__verylikelybad_color (c5)”). —MisterSynergy (talk) 21:21, 30 December 2017 (UTC)[reply]
Thanks, it works now for me.--Ymblanter (talk) 21:45, 30 December 2017 (UTC)[reply]
I already suspected that that kind of URL would work, but was waiting for Arthur to confirm this via mail. I will update the tool page. (Still strange that my Chrome didn't complain - add-on issue possibly?).
And yes, the "ORC" tabs appears on JavaScript pages intentionally for testing purposes (for me as well as for any other developer, possibly also for users who want to have a very quick look).
With the unknown parameters you caught me twice: a) ORC indeed doesn't support highlighting for ORES yet (simply didn't think of that; though the ORES scores displayed are highlighted anyway), and b) the highlighting feature for tags should work, but actually is broken. I will check both tomorrow or the day after. Thanks for reporting! --YMS (talk) 22:03, 30 December 2017 (UTC)[reply]
I updated the documentation and fixed the issue with tag highlighting (which only affected certain tags). But I won't implement the ORES highlighting today as promised. In fact, any filter can be highlighted with the new RC component. I simply wasn't aware of this. But some of those things are already highlighted in ORC in different ways (e.g. IPs, which are color-coded). I will have to think of a more generic solution here, so this goes on my long-term list. Btw, there, I already have more detailed filters for Wikidata already. Until then, if you depend on those filters, you'll probably have to stick with reCh and DiffLists (while I don't want to imply that having to use those is bad in any way). --YMS (talk) 10:55, 31 December 2017 (UTC)[reply]

Patrolling from user contributions, histories and diffs[edit]

Few weeks ago, we had $wgUseRCPatrol enabled on Czech Wikipedia. For this purpose, I and Martin Urbanec have also developed a simple gadget and API on ToolForge which allows marking changes as patrolled from history (phab:T25792), multi-edit span diff (phab:T10697) and user contributions (phab:T16352). Would you like to import this tool to Wikidata as well? Matěj Suchánek (talk) 08:39, 27 June 2018 (UTC)[reply]

Would it be complicated to import it to Wikidata? Your description sounds useful, but I have to be honest that I can’t imagine how useful it actually is just by reading the code of the script … ;-) —MisterSynergy (talk) 08:46, 27 June 2018 (UTC)[reply]
Apparently, after importing it, I noticed it doesn't work well for histories and diffs, only for contributions. We will look into it (the problem is we don't have appropriate indexes on database replicas). You can test it via importScript( 'User:Matěj Suchánek/patrolRevisions.js' ). Matěj Suchánek (talk) 09:43, 27 June 2018 (UTC)[reply]
On contributions lists I can see a new link “Mark all revisions as patrolled”, and it works. I can’t find anything new on diff pages or on page histories, so what am I supposed to do there? —MisterSynergy (talk) 10:00, 27 June 2018 (UTC)[reply]
Nothing. As I said, it doesn't work well there. Actually, it just takes very much time until the server responds. Matěj Suchánek (talk) 10:16, 27 June 2018 (UTC)[reply]
Okay; I’ll leave it activated for a while, to see how it’s going to develop. —MisterSynergy (talk) 11:22, 27 June 2018 (UTC)[reply]
Hello MisterSynergy! For your information, I have rewritten the gadget to query MW API instead, it should work very well now. Matěj Suchánek (talk) 11:36, 18 July 2018 (UTC)[reply]
Thanks, it’s a great script now! —MisterSynergy (talk) 20:59, 18 July 2018 (UTC)[reply]

Unpatrolled edits?[edit]

Hi! I've just observed a strange problem: I've made 4 rollbacks (704237515, 704237643, 704237756, 704238472) and, although I'm an admin, they are marked as needing patrolling. Is it a bug or a new policy? --Epìdosis 17:58, 29 June 2018 (UTC)[reply]

Another unpatrolled rollback (704230598), in this case by a rollbacker; I've just patrolled it. The problem is recent because I don't remember it having met it last week. --Epìdosis 18:01, 29 June 2018 (UTC) Or this (703030631) by @LydiaPintscher:: the problem affects autopatrolled users, rollbackes and admins. --Epìdosis 18:06, 29 June 2018 (UTC)[reply]
I Observed the same problem this morning. No idea what’s on. —MisterSynergy (talk) 19:19, 29 June 2018 (UTC)[reply]
Subscribe to phab:T198449. Matěj Suchánek (talk) 19:44, 29 June 2018 (UTC)[reply]

Special flag for sitelink removals by IP's?[edit]

Could we get a special patrol group together to regularly review removal of sitelinks from items? I've run into dozens of cases recently where sitelinks were removed, and then shortly after duplicate (otherwise empty) items were created for those unlinked sitelinks. There seems to be a pattern of anonymous IP users doing this! Here's an example, which resulted about 10 days later in this action to create a new duplicate item. ArthurPSmith (talk) 21:41, 3 December 2019 (UTC)[reply]

 Support to creating something like this. --Epìdosis 22:19, 3 December 2019 (UTC)[reply]

Hello, I was curious what changes are made by anonymous and new users, and how well they are screened for vandalism. For this I made a few graphs.

Some additional interesting finds about the edits in the last 28 days (September 14 to October 11):

  • 159402 changes in total or ~5700 per day
  • 4.8% were reverted
  • 5.9% were patrolled (not counting reverted changes)
  • from the 48470 term changes (labels/descriptions/aliases):
    • 31% were in English
    • 67% were in one of the 10 most common languages (en, es, fr, de, it, ru, ro, eo, ar, ja)
    • 82% were in one of the 20 most common languages

--Pyfisch (talk) 13:34, 12 October 2020 (UTC)[reply]

@Pyfisch: Thanks. I'm surprised by a few of your "most common" languages, is that ranking based on these 48470 changes or something else? For the "new users", do you have any idea what fraction are active users on other wikimedia projects? ArthurPSmith (talk) 15:15, 12 October 2020 (UTC)[reply]
@ArthurPSmith: The ranking is based on these 48470 changes from the past 28 days (edits of type "wbsetlabel", "wbsetdescription" or "wbsetaliases"). Especially Esperanto (code: eo) was odd to see among the most commonly edited languages. I just checked and out of the 1422 changes for Esperanto terms 864 were made by User:Charp238 and 508 by User:Spondex, the remaining 50 edits were made by various IPs and users. Because there are relatively few edits a small number (or just one) active new editor can catapult a language into this ranking.
How many of the "new users" are active on other Wikimedia projects is a bit more difficult to answer. 94174 changes in this time period (or 59%) were made by anonymous users. Do you know an easy way to check whether the logged-in users are active on other Wikimedia projects? I can also email you the list of names. --Pyfisch (talk) 16:09, 12 October 2020 (UTC)[reply]
@Pyfisch: There's xtools but I don't know if there's an API to query for a list of user accounts there...? ArthurPSmith (talk) 16:52, 12 October 2020 (UTC)[reply]
Maybe such API calls could help. —MisterSynergy (talk) 17:57, 12 October 2020 (UTC)[reply]
@ArthurPSmith: Thanks for looking up the correct API call, MisterSynergy! I sampled 1000 random users out of the 10383 non-autoconfirmed users who edited Wikidata during the time. (The reason for only analyzing a random sample is that I didn't want to do 10k API requests)
From the sample of 1000 "new users" who edited Wikidata between September 14 2020 and October 11 there are:
  • registration time
    • 497 accounts were registered before 2020-01-01
    • 247 accounts were registered between 2020-01-01 and 2020-09-13
    • 256 accounts were registered after 2020-09-13
  • highest number of edits on another wiki:
    • 121 accounts only edited Wikidata
    • 201 accounts have between 1 and 9 edits on another wiki
    • 244 accounts have between 10 and 99 edits on another wiki
    • 224 accounts have between 100 and 999 edits on another wiki
    • 210 accounts have more than 1000 edits on another wiki
In conclusion almost half of "new" users are active editors on another wiki (>= 100 changes). A quarter of the accounts was very recently created and some people only create an account to edit Wikidata. Does this answer the question? --Pyfisch (talk) 21:04, 12 October 2020 (UTC)[reply]
We should really dig more into this. Since the data about user accounts across all Wikimedia projects can be retrieved in a simple API call, I reckon that there is some database table somewhere that contains this information so that it can easily be queried—for plenty of users in a single query. This should avoid too many API calls. But… where is this table? :-)
Another interesting evaluation is a histogram of unpatrolled edits over Q-ID bin (containing 1 million items per bin). A substantial amount of unpatrolled edits is in either the very first or the very last (two) bins, out of currently 101 bins. —MisterSynergy (talk) 21:10, 12 October 2020 (UTC)[reply]
Maybe it would be useful to give some of the most active users, like User:Charp238, the autopatrol flag, assuming their edits are good. Ghouston (talk) 23:10, 12 October 2020 (UTC)[reply]
That user is already autoconfirmes, so they do not create any new "unpatrolled changes" anymore. —MisterSynergy (talk) 08:29, 13 October 2020 (UTC)[reply]
How did they manage to make 864 edits before they were autoconfirmed? Hmm, because there's a time requirement as well as an edit requirement? Ghouston (talk) 21:36, 13 October 2020 (UTC)[reply]
Yes, promotion to "autoconfirmed" usually happens after 50 edits *and* four days after the first edit. See Wikidata:Autoconfirmed users. All their edits are meanwhile patrolled using User:Matěj Suchánek/patrolRevisions.js on their user contributions list, btw. —MisterSynergy (talk) 21:42, 13 October 2020 (UTC)[reply]
@MisterSynergy: Which information do you want to gather about these users? What is the question you are trying to answer. With a sample of 1000 users the margin of error to the real numbers should be low, in the region of ±3%.
The bins look pretty much as expected first bin and the last few are most active, while the first bin has a higher percentage of reverts than average (30800 changes in first bin, 9.2% reverted). (I did the plots yesterday, I don't have time to recreate them right now and get the exact numbers.) --Pyfisch (talk) 17:28, 13 October 2020 (UTC)[reply]
I'm considering whether we could make some more bot reports about unpatrolled recent changes and where to look at. I am not so much interested in snapshot information, although it has its value of course. Findings about experience in other Wikimedia projects could be quite valuable, but it is only feasible to include it if it can be queried in bulk. —MisterSynergy (talk) 18:22, 13 October 2020 (UTC)[reply]
Very interesting indeed. Was it complicated to compile this data? I think it would be useful to update it regularly, e.g. weekly or so, but without the table publication at Commons. —MisterSynergy (talk) 15:54, 12 October 2020 (UTC)[reply]
Compiling the data took some work and I needed some time to decide which information is interesting and to present it using Vega Graphs. The tables are produced by a script, I can easily set up a cron job to update the tables at Commons daily or weekly. Is there a reason not to publish these tables on Commons, as this seems to be the recommended way to include data in Vega graphs? Pyfisch (talk) 16:09, 12 October 2020 (UTC)[reply]
Really interesting data! Thank you very much! --Epìdosis 16:46, 12 October 2020 (UTC)[reply]
Okay, I missed the fact that you need the tables for the graphs. Fine to have them then. —MisterSynergy (talk) 17:59, 12 October 2020 (UTC)[reply]
  • An observation: by far not all reverted unpatrolled changes are marked as reverted with tags in the database. You probably underestimate the number of reverted changes with your evaluation. —MisterSynergy (talk) 08:26, 13 October 2020 (UTC)[reply]
    • Probably, as MediaWiki has a heuristic to detect reverts and it only considers the last 15 or so revisions. What also isn't taken into account is that another user may not restore the old description/label etc. but set a different good label. --Pyfisch (talk) 17:28, 13 October 2020 (UTC)[reply]

Project strategy proposal[edit]

For the past few days I have patrolled German terms and changes that were automatically tagged for vandalism, after reading the Cluebot discussion. I observed that a large portion of vandalism, test edits and otherwise problematic changes don't seem to be noticed and reverted. The graphs I published above confirm that only around 11% of changes by new users are either reverted or patrolled , although undoubtedly a higher number has been seen and found to be good by other editors. A filtered view of recent changes usually uncovers additional vandalism that has been present in Wikidata for a few days, without searching for more than a minute. It is not all negative: much vandalism and other problematic edits is reverted promptly, active vandals are quickly reported at the noticeboard and many non-notable and spam items are deleted. My view is that Wikidata currently has not enough users that are active in counter-vandalism to effectively detect and revert almost all vandalism and fix problematic edits. (If you don't agree at all with this assessment, please say so.)

Proposal

I'd like to propose a few steps to remedy this issue:

--Pyfisch (talk) 16:17, 14 October 2020 (UTC)[reply]

Discussion
@Pyfisch: Ooh, I think a few of these can work together for good - a monthly patrollers report combined with advertising the project and tools. Could there be a permanent spot for patrolling/counter-vandalism in the weekly status update? ArthurPSmith (talk) 17:44, 14 October 2020 (UTC)[reply]
Yeah, I think the patrolling should be more rewarding. However, absolute top-lists by month (or so) are not that useful; maybe just the top 10 users in alphabetical order. Motivated by Pyfisch's comment above, I looked into unpatrolled recent changes for two days now and easily patrolled more than 22,000 revisions using User:Matěj Suchánek/patrolRevisions.js, which is more than 50% of all patrols in the past 30 days. There are indeed plenty of IP or newbie users who make something of 500 (unpatrolled) edits per month, with pretty decent quality. Their edits can be batch-patrolled with relatively little effort, so a pure number of patrols per month is not that meaningful. —MisterSynergy (talk) 18:08, 14 October 2020 (UTC)[reply]
I've added two new sections to WD:PATROL: Monitoring changed terms and Monitoring suspicious changes. The list above contains a few more useful filters, but I find these changes more difficult to review, so I didn't include them for now. What ideas do you have for improving this page?
When we ask users to review changes in their language, I'd like to link to WD:PATROL#Monitoring changed terms.
I created a user-box for this project: Template:User counter-vandalism. Which icon do you think would fit the project? Right now the user-box uses a shovel and a broom. The broom is associated with counter-vandalism work and reverting. Other projects like w:Wikipedia:Counter-Vandalism Unit use police-like badges, which I'd like to avoid. There is also File:Counter Vandalism Unit-en.png, but it doesn't fit because we are not a Wikipedia.
--Pyfisch (talk) 09:28, 17 October 2020 (UTC)[reply]
I think your modifications to WD:PATROL fit more into the scope of this WikiProject. It deserves some more development of subpages or so anyways. ---MisterSynergy (talk) 22:23, 17 October 2020 (UTC)[reply]
I think it is important to give information on how to effectively patrol recent changes besides looking at the unfiltered list. Especially for monitoring terms in various languages we need as many people as possible to patrol terms in their language. --Pyfisch (talk) 19:31, 18 October 2020 (UTC)[reply]

Intricate IP vandalism involving many items and sitelinks[edit]

Hi, do we have any page besides WD:AN to discuss individual vandals? Anyways 2001:56A:7790:C900:7127:B29A:DFB6:F8CB (talk • contribs • deleted contribs • logs • filter log • block user • block log • SUL (for IP: GUC)) makes a huge mess. They find two items that are about similar concepts and move the sitelinks from one item to the other (improper merge). However they don't let the first item be but reuse it for another completely unrelated Pokémon concept. When you try to rollback the first item it doesn't work because the sitelink has been added to the second item. So you need to restore the second one first. To make things more difficult they move sitelinks across multiple items i.e. sitelinks from A → B → C → D. Or they move sitelinks from two items to each other. Additionally they add claims referencing to the new Pokémon items they created. I tried to undo some of their changes but I am giving up because it is too complicated and I lost track what they actually tried to do. --Pyfisch (talk) 21:20, 20 October 2020 (UTC)[reply]

By the way, if you use rollback but it doesn't work because of sitelinks the versions are still marked as patrolled so the patrolled changes may not actually be patrolled. --Pyfisch (talk) 21:22, 20 October 2020 (UTC)[reply]
WD:AN only. Or try getting admin rights by yourself, maybe in a few weeks since you actually became active in August this year and half a year of committed participation is definitely more appealing to the voters. Nevertheless, as of today, I would vote "support" for sure ;-) ---MisterSynergy (talk) 21:36, 20 October 2020 (UTC)[reply]
Topic:Vwhjkmcv48f328lm is related, but I'd appreciate not to see related comments by other users in that very topic as the IP user seems not very used to discussions here and we do not want to confuse them. If anyone wants to provide additional information, please do so here or in a separate topic on my user talk page. —MisterSynergy (talk) 10:02, 25 October 2020 (UTC)[reply]

Counter-vandalism at the Wikidata:Eighth Birthday[edit]

On Wednesday and Thursday we are going to have this virtual birthday event, and I volunteered to be one of the facilitators of the anti-vandalism session. It will be on Thursday 29 October at 13:00, in English, and everybody is warmly welcome to attend and to participate in the discussion. Sorry for our North American friends, I just picked up the time I could make it. I do not have good ideas what to do, but I guess we will be talking most or all of the time about developing new tools.--Ymblanter (talk) 19:35, 26 October 2020 (UTC)[reply]

Thank you for the invitation. After focussing on patrolling and anti-vandalism activities in recent weeks, I would love to attend to share my insight—but I can't unfortunately at that time. As there is apparently no recording planned, can you please provide a short summary of discussed topics after the session? —MisterSynergy (talk) 13:36, 28 October 2020 (UTC)[reply]
I will try.--Ymblanter (talk) 14:29, 28 October 2020 (UTC)[reply]
Thank you. You might have seen that I had the opportunity to be there, although unfortunately without mic. So, for me a summary is not really necessary, but maybe others might be interested. —MisterSynergy (talk) 14:00, 29 October 2020 (UTC)[reply]
Indeed, thanks for attending and for your contribution. Unfortunately I find it difficult to summarize the big picture differently from "there are a lot of good ideas around, but they need to be coded to become tools".--Ymblanter (talk) 19:30, 29 October 2020 (UTC)[reply]
Though we seem to have got some people interested in patrolling--Ymblanter (talk) 21:10, 30 October 2020 (UTC)[reply]
Are the slides from the presentations available somewhere? --Pyfisch (talk) 21:41, 30 October 2020 (UTC)[reply]
I can upload mine, though I do not believe they are of any value - I just used them to stimulate the discussion. About Houcemeddine, I can ask him.--Ymblanter (talk) 22:38, 30 October 2020 (UTC)[reply]

Expand resources in this WikiProject?[edit]

After the discussion yesterday I thought that we might want to expand the resources in this WikiProject. In particular, I think detailed tool descriptions, maybe workflow descriptions, and more filtered worklists etc. might be useful for less experienced editors to get into this. Any thoughts on that? —MisterSynergy (talk) 21:55, 30 October 2020 (UTC)[reply]

Yes, absolutely.--Ymblanter (talk) 22:38, 30 October 2020 (UTC)[reply]

User:Bene*/userwarn.js[edit]

For quick notifying users of warnings, there is an old script User:Bene*/userwarn.js. Could it become part of the anti-vandalism toolkit? It does have some flaws, I can think of:

  • (probably) not working on Flow pages (for new registered users)
  • not having a complete list of warning templates (it's necessary to synchronize manually or implement loading from category)
  • primitive watchlist management (it could be useful to implement temporary watching talk pages when deployed here)

I can try to work on some of these if necessary. --Matěj Suchánek (talk) 10:31, 31 October 2020 (UTC)[reply]

There are a few such scripts. I am currently testing User:Ahmad252/scripts/UserWarning.js. One thing I don't like about the Bene* script is that it only adds an entry to the left menu. I'd prefer to have another button next to "edit" and "hist" like the merge and request-delete gadgets add. In addition the user interface and available warnings should be considered.
Right now I am using:
Some templates are probably used very rarely like uw-vandalism4, because users are usually blocked after they ignore the first warning. Certainly some users can't understand the English warning, so multilingual warnings may be helpful. --Pyfisch (talk) 21:07, 31 October 2020 (UTC)[reply]

Urge highly active IP users to use an account?[edit]

This might be a bit controversial, but I think we should at least discuss the idea: could and should we somehow urge "highly active" IP users to create and use an account which they can then bring to "autoconfirmed" status in order not to produce unpatrolled changes any longer? This would reduce the patrol workload on our side, as these users tend to show a high understanding of Wikidata and they usually are good-faith contributors as well—otherwise they would not be able to make that many edits. It would also be beneficial for the community as there would be a contactable editor in case of issues with the edits.

For some insight, I have looked at the past 30 days and show some statistics:

  • IP users with 2000 or more edits: 1; total output of these users: 3661 edits (2.2% of all unpatrolled edits)
  • IP users with 1000 or more edits: 4; total output of these users: 7899 edits (4.8% of all unpatrolled edits)
  • IP users with 500 or more edits: 12; total output of these users: 14.042 edits (8.5% of all unpatrolled edits)
  • IP users with 200 or more edits: 43; total output of these users: 22.914 edits (13.9% of all unpatrolled edits)
  • IP users with 100 or more edits: 99; total output of these users: 30.675 edits (18.6% of all unpatrolled edits)
  • IP users with 50 or more edits: 257; total output of these users: 41.747 edits (25.3% of all unpatrolled edits)
  • IP users with 20 or more edits: 663; total output of these users: 53.815 edits (32.6% of all unpatrolled edits)
  • IP users with 10 or more edits: 1265; total output of these users: 61.973 edits (37.5% of all unpatrolled edits)
  • IP users with 5 or more edits: 2582; total output of these users: 70.470 edits (42.6% of all unpatrolled edits)
  • IP users with 2 or more edits: 7988; total output of these users: 84.083 edits (50.9% of all unpatrolled edits)
  • IP users with 1 or more edits: 20.805; total output of these users: 96.900 edits (58.6% of all unpatrolled edits)

This does not even consider that IPs are transient, and one person might have accumulated edits on several "IP accounts" within 30 days. Based on my patrolling experience, this does indeed happen quite a lot and I can meanwhile guess in some cases just from the IP range which sort of edits to expect from the IP editor.

The question is where to draw a line here where we would rather seem them to use an account, and how we would do this. A software-controlled popup that comes up after every 10th edit (or so) is likely pretty annoying. Userwarn-like talk page messages are another option, but only work for currently active IP users. Any ideas? —MisterSynergy (talk) 22:50, 31 October 2020 (UTC)[reply]

  • I do think it's worthwhile to encourage users to register an account. Every 10th edit might be a bit much but every 50th shouldn't be too annonying and being regularly reminded of registering an account could nudge them to actually do it. ChristianKl23:24, 1 December 2020 (UTC)[reply]
  • Contacting these users is certainly worth a try. Alternatively, we could create a list of trustworthy IP users. A bot could then automatically patrol their edits. --Pasleim (talk) 19:47, 2 December 2020 (UTC)[reply]
    • Mh. Most IPs are not static, and some prolific IP editors share their ranges with other users who are problematic. If you want to contact them, you need to catch them while they are active. Particularly IPv6 IPs tend to be extremely volatile, as usually only the first 64 bits of those IPs are somewhat static. I really had software-based solution in mind, something that continuously reminds (and maybe also annoys) IP users to create an account. For them it comes with some benefits, such as watchlists, being contactable, can use tools, can build reputation in order not to be reverted as much, etc. —MisterSynergy (talk) 20:07, 2 December 2020 (UTC)[reply]

Now possible to highlight wiki-tour items in recent changes[edit]

I'm not sure where to add this is, but see phab:T246814 and the suggestion to add something like:

.mw-tag-tour-item { background-color: #CCFFCC; }

to common.css. Cheers, Bovlb (talk) 22:15, 1 December 2020 (UTC)[reply]

New tool: patrol statistics and batch patrol helper[edit]

Hey all, in the past weeks I have patrolled a substantial amount of revisions, and I made myself a new tool to help me doing so: see msbits.toolforge.org/wdcvn. It is still work in progress, and turned out to have basically two components:

Patrol statistics
The tool provides various insights into the patrol process over the past 30 days, and how much of the unpatrolled changes have meanwhile been patrolled. All of that is being broken down by several different factors such as time, properties, languages, and so on. It really helps to get an idea about what non-confirmed users are actually doing here at Wikidata. All numbers are being updated hourly.
User- and item-based patrol worklists
I found that it is pretty efficient to use batch-patrolling for users who make a lot of unpatrolled edits, as opposed to the single-revision-based patrol process that we are so used to. The tool helps to identify such users with many unpatrolled revisions and provides deeplinks to their Special:Contributions page where I use the patrolRevisions.js user script to efficiently patrol all of their revisions. This way, I can roughly patrol 50% (~80.000 per month) of all unpatrolled changes. The worklists are also being updated hourly, and "done" stuff disappears from the reports with the updates.

It is important to mention that the tool itself is not supposed to interact in any way with Wikidata directly; it merely points to places onwiki where batch patrolling should then be done using the patrolRevisions.js script. I also use the DiffLists.js script to get a good idea of user changes.
As this tool was initially not meant to be useful for anyone else but myself, I am aware that it might be a bit quirky and not as polished as it should be. I am open for all sorts of input, and I explicitly mention here that further reports or statistics are relatively simple to add, since the data structure in the background allows a huge variety of efficient evaluations. So, if you have ideas, do not hesitate to contact me and make suggestions, please. —MisterSynergy (talk) 00:20, 9 December 2020 (UTC)[reply]

Thanks! I've found patrolRevisions.js really helpful (I just used it a few minutes ago to patrol a dozen good changes by a new user). Looking forward to trying this out too. ArthurPSmith (talk) 15:52, 9 December 2020 (UTC)[reply]
With your tool, we easily reach 250 patrols reviewed / h. Good idea. —Eihel (talk) 06:22, 10 December 2020 (UTC)[reply]

An update: my patrol helper tool has been moved to a separate tool account, so the URL is now:

I also completely rewrote the frontend (switched from PHP to a Python Flask app), updated the UI, made the tool name simpler, and polished the content a bit. Happy to accept comments regarding the tool, particularly its UI which is clearly not my field of expertise. —MisterSynergy (talk) 20:09, 10 June 2021 (UTC)[reply]

About vandalisms to labels/descriptions/aliases[edit]

Hi all! The most vandalisms I find usually regard labels/descriptions/aliases of items about much-known subjects (the last I've reverted, today) and are made by IPs (this is just my impression based on daily check on my watchlist containing about 70k items). At the moment we have automatically semiprotected all properties (Wikidata:Requests for comment/General semi protection for all property pages) and all items used by 500+ Wikipedia pages (Wikidata:Requests for comment/semi-protection to prevent vandalism on most used Items). I would like to discuss here one further step, which however wouldn't be technically possible at the moment, but probably would be in the future. I would formulate it in the following way: semiprotecting existing labels/descriptions/aliases in items containing 20+ (the number might be changed) sitelinks. This would be similar to what has been applied to the two above cases, but with two relevant differences: it doesn't involve statements; it doesn't involve non-existing labels/descriptions/aliases, so that translations in minority languages would be addable. The main technical difficulty is that it is still impossible protecting single parts of Wikidata items (phab:T189412), not to mention the differentiation between existing and non-existing labels/descriptions/aliases which is probably much more complex (I think a Phabricator ticket still doesn't exist, but I may be wrong). I underline that, the aforementioned differentiation being available, I would support applying it to all properties (maybe also to most used items) instead of the existing total semiprotection. Opinions? Thank you all, --Epìdosis 16:24, 19 December 2020 (UTC)[reply]

I think it would technically not be very difficult to implement such a "protection" using an AbuseFilter which evaluates the edit summary. The first part of it does indicate which content is being edited, and it differentiates between addition, modification, and removal of labels, descriptions, and aliases. I do not have much experience using edit filters and I am not sure whether the performance of such a filter would be acceptable, but it should be possible immediately. Maybe we can define such a filter which logs only, to test this approach.
However, I have to mention that we see plenty of pretty productive IP activity—around 100.000 edits per month. At this point, I would rather see more patrolling than more edit restrictions. —MisterSynergy (talk) 16:37, 19 December 2020 (UTC)[reply]
Of course I agree about the need of more patrolling and about the productivity of IPs, which often do a very good work. However, while the additions of new labels/descriptions/aliases by IPs are usually good, I'm not sure that the edits to existing labels/descriptions/aliases made by IPs in very important items have so good a quality; having a statistic, also on a small sample, would be very interesting in order to verify the correctness of my impression. --Epìdosis 16:42, 19 December 2020 (UTC)[reply]
I think that the number of edits to existing labels/descriptions/aliases is much smaller than the additions of new labels/descriptions/aliases, so maybe making a statistic wouldn't be so problematic. If I can give some help, especially in this period around the Christmas holidays, I'm surely interested. --Epìdosis 16:45, 19 December 2020 (UTC)[reply]
Okay, since I have all the info in a Python script available anyways, I can easily provide some numbers based on the editing in the past 30 days:
  • We had around 10.000 modifications of labels/descriptions/aliases (i.e. not additions or removals, just modifications of existing ones) by IP editors; I did ignore a small percentage of changes here, as they are pretty complex. The following numbers correspond to the total amount of ~10.000 modifications and the time period of 30 days.
  • Around 55% of them are description changes, 36% label changes, 6% alias changes, and the rest are modifications of all three at the same time.
  • Languages: 44% English, 14% Spanish, 6% Italian, 4% Russian, 4% French, 3% German, all others less than 2%/200 edits each.
  • The ~10k edits have been made on ~7k different items.
  • Out of the 7k items, ~2k have 20 or more sitelinks; there were ~2.8k relevant changes on these 2k items
  • Out of the ~2k items with 20 or more sitelinks, more than 500 are "highly used" and going to be indefinitely semi-protected rather soon anyways. Fewer than 1500 are not "highly used" and will not be semi-protected.
  • The <1500 items with 20 or more sitelinks that are not "highly used" saw fewer than 2000 modifications of terms by IP editors, out of which a bit more than 400 have been reverted (21%); 79% of these changes have not been reverted, but there can of course still be some undetected vandalism; from my patrolling experience, most has probably already been reverted, though.
There you have it :-). Your proposal from above was basically to block the <2000 edits in the last bullet point. —MisterSynergy (talk) 00:46, 20 December 2020 (UTC)[reply]
@MisterSynergy: Ok, very interesting data! In fact the number of items involved is smaller than I thought and also the percentage of reverts is much smaller than I expected (I thought at least 50%!). Just to have a look, if it is not too difficult for you, could you paste e.g. in User:Epìdosis/Check edits some 100 or 200 links to the 79% of changes which weren't reverted? I'm very curious to see them. ;-) Anyway, I understand that my proposal isn't necessary at the moment and that actual anti-vandalism tools seem to work well, which is good news! --Epìdosis 09:58, 20 December 2020 (UTC)[reply]
The report is there. It is fairly simple to regenerate it, so let me know if you need an update. —MisterSynergy (talk) 17:46, 20 December 2020 (UTC)[reply]
Instead of having broad rules to protect items I would prefer if we would more often permanently protect items instead of protecting them for 6 months. Generally having a protection category that only protects labels/aliases/descriptions and still allows other edits might also be useful. ChristianKl14:57, 20 December 2020 (UTC)[reply]

Icons for warning templates[edit]

These icon were changed with any discussion and I don't see them any good (other too). I know it's added by myself in that template but at that time I just copied the syntax of another template. I think we need a discussion that community actually wants them or not. ‐‐1997kB (talk) 14:30, 15 January 2021 (UTC)[reply]

And whats' Wikimedia’s latest visual standard? ‐‐1997kB (talk) 14:31, 15 January 2021 (UTC)[reply]
@1997kB: The icon was changed without any discussion or with some discussion? They mean just the opposite. :) I’m open to community discussion, of course.
Wikimedia’s latest visual standard is documented at https://design.wikimedia.org/style-guide/, and it’s used by now in almost everything that is provided by the software (the notification icons at the top, the edit buttons of the item/property interface, and so on). —Tacsipacsi (talk) 20:06, 15 January 2021 (UTC)[reply]
By the way, the OOUI icons of the first and second templates of the series were added by Ladsgroup, so he may also have some thoughts about this change. —Tacsipacsi (talk) 20:12, 15 January 2021 (UTC)[reply]
Yes, we are trying to standardize the look of Wikis to give a consistent look and feeling to users. So they don't see fifty different shades of blue but only see #36c instead (see the colors in the design guide). Same goes for icons. Also, keep in mind that those glass and detailed icons while might look okay in desktop, they don't look that good in small screens. I recommand keeping the ooui icons (lots of other wikis are also doing this) 23:48, 15 January 2021 (UTC)
I would love to have a complete replacement here, but not the partial. I couldn't find any good replacement for level 4 and level 2 (the OOUI one is too bright ) warnings so having two different styles of icons is more confusing. Further I think this needs to be discussed on project chat so that we get more opinions. ‐‐1997kB (talk) 03:07, 16 January 2021 (UTC)[reply]
@1997kB: The OOUI icon here looks actually the darker of the two for me. And changing three icons without discussion just because you haven’t found a fourth one, and then demanding community discussion for reverting is… er… rather unconventional. (By the way, I think would fit as a fourth icon, although I don’t think it’s me who should search for a fourth OOUI icon.) —Tacsipacsi (talk) 20:34, 16 January 2021 (UTC)[reply]
No I'm not demanding anything, these icon were changed to OOUI without any discussion [4] [5] [6] [7] [8] [9] and I just restored them as they were. And as I showed before Mike Novikoff even reverted one of the change, so it's not me who is adding without discussion, it's the opposite. And yes we need either full replacement or none at all as these templates are used collectively and having a different style of icons will be confusing for new users. ‐‐1997kB (talk) 03:04, 17 January 2021 (UTC)[reply]
@1997kB: Sorry, I have no idea how could I forget that some of these icons were originally non-OOUI. :( However, these changes were all but one over two years ago, so the status quo has been OOUI for quite a while. —Tacsipacsi (talk) 00:53, 18 January 2021 (UTC)[reply]

Moved from Special:Permalink/1343198064#Hey. ‐‐1997kB (talk) 14:56, 18 January 2021 (UTC)[reply]

Level of warning Current icons with white BG Current icons with black BG OOUI 1 icons with white BG OOUI 1 icons with black BG OOUI 2 icons with white BG OOUI 2 icons with black BG
Level 1
Level 2
Level 3
Level 4

So should we use OOUI (1/2) or icons that we already have? See comparison above. ‐‐1997kB (talk) 15:12, 18 January 2021 (UTC)[reply]

Notified participants of WikiProject Counter-Vandalism @Mike Novikoff, Ladsgroup, Ahmad252, Tacsipacsi: ‐‐1997kB (talk) 05:54, 19 January 2021 (UTC)[reply]
  • I am convinced that Wikimedia projects are serious enough to use icons with the same style, rather than a mixture of different icon packs. And in this regard, we should use the icons from OOUI for the main templates in the same way as we use them in the interface. Unfortunately, in practice, problems do arise that some icons are missing, but this is an excuse to solve these problems, and not go back years. In this particular case, it seems to me that the best option is either to use the OOUI 2 set, or to use OOUI 1, but with the same versions of the level 1 and 2 icons: (I uploaded it with the warning style). —putnik 12:49, 19 January 2021 (UTC)[reply]
  • OOjs is a disaster in many ways, and OOUI is aesthetically something like "welcome back to 1980s" (where CGA is great and EGA is an enhancement). If we talk about standardization, it should be noted that the templates in question originate in enwiki where they are widely used by Twinkle, Huggle and the likes, and enwiki is NOT going to change the icons. — Mike Novikoff 18:30, 19 January 2021 (UTC)[reply]
    @Mike Novikoff: Aesthetics is much a question of personal preference, so I don’t think it’s worth debating (by the way, I like OOUI, as you might have figured out). A more objective question is visual consistency. Whether you like it or not, MediaWiki’s interface—including the Vector skin and the Wikibase interface—is transitioning to OOUI (if you don’t like it, you can try convincing people to change this general design decision at an appropriate place, but this is not a such place). Since these messages appear on Wikidata, I don’t see why we should remain consistent with templates on a completely different wiki rather than becoming consistent with the interface of this very wiki.
    Of the two OOUI sets, I also think the second is better—first, the first set is inconsistent with filled/unfilled (the fourth icon is still filled even with putnik’s unfilled orange information icon); second, filled icons are more eye-catching, which is a good thing IMO. —Tacsipacsi (talk) 19:10, 19 January 2021 (UTC)[reply]
  • I do not have strong opinion about the icons. Nuvola set is fine and OOUI-2 is fine. OOUI-1 is my lest favored due to inconsistencies. I would like those icons to match the icons used by the MediaWiki and other tools for consistency. --Jarekt (talk) 19:19, 19 January 2021 (UTC)[reply]
  • I would prefer ooui 2 given its consistency both between the icons themselves and the icons and the interface. I want to mention that making designs simpler looking has proven in study after study to reduce the cognitive load and making reading easier in small screens and it's much better in matter of accessibility. The first pack are not logos, they are paintings and less practical for us. Amir (talk) 23:57, 19 January 2021 (UTC)[reply]
  • My personal preference is the one we have right now, because I think they convey the best meaning to users. For example Level 4 icon in OOUI is an icon for block, but by using the warning we try to stop user so they don't get blocked. If this discussion end up adopting OOUI, then OOUI 2 is much preferred as it's consistent but still that block icon could use a replacement. ‐‐1997kB (talk) 07:49, 20 January 2021 (UTC)[reply]

Counter vandalism university master thesis potential[edit]

Hi: I've been asked about potential projects for master thesis related with cybersecurity in Wikidata. I really have no idea and thought about counter-vandalism. So, is there any problem or task suitable of being developed as a master's final project? Is there any place better to ask? Thanks. —Ismael Olea (talk) 10:54, 9 October 2023 (UTC)[reply]

@Olea: Every once in a while somebody proposes banning edits by anonymous (IP address) users in Wikidata, as a way to reduce vandalism. It would be useful to have a credible and thorough analysis of what fraction of anonymous edits could be considered vandalism, perhaps comparing to similar numbers for non-anonymous but non-autopatrolled users. Other avenues could be looking at what kinds of automated or semi-automated tools are actually helpful here. ArthurPSmith (talk) 19:29, 9 October 2023 (UTC)[reply]
Thanks! :) —Ismael Olea (talk) 10:38, 10 October 2023 (UTC)[reply]
@Olea: One thing I've wondered about is now that AI tools are more available, it can be used to generate plausible-looking bullshit data in large quantities. This could be harmful on Wikidata where it is already hard to distinguish certain data from nonsense to begin with (identifier values etc.) and so it is an even bigger challenge to patrol compared to Wikipedia. It would be interesting to try to assess just how big a threat this really is and if it is even a likely scenario. Also would it render existing AI-tools for finding vandalism ineffective? This would be more in the domain of qualitative research I guess. Infrastruktur (talk) 10:59, 10 October 2023 (UTC)[reply]
Interesting! El Pantera (talk) 12:44, 10 October 2023 (UTC)[reply]

Constraints toolbox and other stuff[edit]

As teased in the other discussion we had about countering vandalism, I published my Python library for property constraints evaluation: code. Given two entity revisions, it can identify differences and tell if there were violated (or satisfied) any property constraints. It implements algorithms for most constraint types, and additionally some custom rules (ideas welcome). I'm currently using to scan past changes for overlooked vandalism.

In addition, I have identified a few software issues:

  • phab:T357163 – revisions are sometimes not marked as reverted when you use "restore"
  • phab:T357204 – changes in references produce obscure diffs

Another useful feature would be phab:T335256. When the task is resolved, it should be possible to filter out more sitelink additions.

Repeating some ideas for new tools from the discussion:

  • Have a bot patrol new links if there is "some match" with the item.
  • User script that provides machine translation inline, using WMF services.

Any more ideas for collaboration? --Matěj Suchánek (talk) 20:58, 24 February 2024 (UTC)[reply]