Tuesday, 18 August 2009

On programming, wikis and protecting against vandalbots

This post is a continuation/expansion of this post where I describe the creation of a proof-of-concept vandalbot. So go read the first post... shoo! Done? Good, on to how to protect a wiki from automated nasties.

You may think 'why not just block them on sight'? That's all well and good for sites like Wikipedia, where hundreds of thousands of users are monitoring recent edits, any undesirable ones are reverted and the offending user blocked. However, for smaller wikis, the vandalism may go unnoticed for several hours or even days, making the following preventative measures necessary.

The most effective thing you can do is to install the AbuseFilter extension, and then set up rules to throttle edits (only allow X edits in Y minutes) from new/unregistered users. This is very effective, and prevents vandalbots from editing wildly, thus giving admins a chance to see the vandalism and block the bot before much damage has been done. Rules can be programmed to trigger on just about anything, and carry out a wide range of actions when tripped.

However, this is not easy for those inexperienced with MediaWiki, nor is it possible for wikis hosted on external servers (such as Wikia). However, if you can do it, it is the best way to limit vandal/spam bot activity.

If your wiki is quite small, or aimed at a niche community, you could edit LocalSettings.php (assuming you have access to the file, I know Wikia wikis need to have such changes approved) and restrict anonymous (unregistered) users from editing and even restrict new account creation by anyone except admins, thus requiring prospective new users to request an account. This will put off casual vandals, and make creating even a small set of vandalbot accounts difficult (if you suddenly get 30 requests in a day when you usually get 4, something is wrong).

OK, what if you are a wiki admin with no access whatsoever to the low-level settings of your wiki. What do you do to help protect it? Well, you could refer the site administrator to this post (^_^), or do the following:
  • Watch out for the mass-creation of user accounts, especially with nonsense names or incremental names (Dfghsj01, dfghsj02, etc.), and block them if at all suspicious.
  • If a bot does strike and cause mass havoc, fight fire with fire and use a bot (or a bot process running on your account) to undo the damage. I have created an antivandal utility which you can download here (source included (C#), dotnetwikibot included). It allows you to auto-revert a set number of edits by a certain user. It's very user-friendly and relatively fast.
In all probability, your wiki will never come under fire from a malicious bot - especially if you implement the preventative measures - but if it does, at least you now know what to do!

UPDATE: If you're interested to see just what malicious bots can do, have a look at this. It's the contribs list of a test bot operating on my recently-created test wiki. The bot created 50 pages in just under 2 minutes, and then 'vandalised' said pages at about the same rate (25epm). I also tried testing the pagemove routine, but Wikia obviously has a throttle to prevent mass-pagemoves.

Monday, 17 August 2009

On programming, wikis and a proof-of-concept vandalbot

Disclaimer: I am not to be held responsible for anything you may do with the code samples contained in this post. My 'vandal' bot is a proof-of-concept only, and I am giving an insight into its creation as an intellectual/programming/wiki-management exercise only.

Skip to the bit marked 'Programming ahead' if you already know/don't care about Wikipedia and the background to all this.

Before we get into the business of programming bots of wiki destruction, a little background. I used to be quite an avid Wikipedia editor/bot coder, up until early this year when the immense levels of bureaucracy and drama became too much to cope with. For those unaccustomed to Wikipedia, and therefore do not have the faintest idea what I am on about (few people do anyway), here's an analogy. Think of a group of teenage girls crossed with a government department, and that's a pretty good idea of what we're talking about. Don't get me wrong, WP is an amazing resource, but surely you can write an encyclopedia without having to have a week-long process for even the simplest thing?

Anyway, back to the main event. Wikipedia - and all wikis for that matter - have a persistent problem with vandalism. Usually this consists of puerile comments added into articles (usually high-profile ones), page blanking or link spam. However there are a small group of extremely determined vandals (especially on high-profile, high-traffic sites) who edit vast numbers of pages in a very short time, often replacing their content with shock-site images or mindless crap designed to crash browsers (huge images, malformed HTML, etc). Another breed moves (renames) pages en-mass to names such as ON WHEELS, or variations on HAGGER???? (the favourites of vandals known as 'Willy on Wheels' and 'Grawp' respectively).

One day about a month ago when I came across some of this 'vandalism-on-steroids', I wondered: how do they do it? And thus the vandalbot experiment began. I figured that they had to be using some kind of editing bot to do the job, based on how many edits per minute they were making. 30-40 epm is far too high a rate for even the most determined idiot. So, using my (then rather rusty) knowledge of bots, I set out to reverse-engineer (in a very loose sense) a method for mass-vandalism.

Warning: Programming ahead!

For this project, I used Visual C# Express 2008 and the dotnetwikibot API, both are free and easy to use.

I'm not going to go into the intricacies of using VC#, or programming using dotnetwikibot. There are plenty of tutorials and examples out there, and I am not writing a 'how to make a awsum vandalbot to pwn wikipedia!!!!111' for all the script kiddies out there. This is about finding out how they can be used to the detriment of wikis, and how to protect against similar bots. To those with experience with wikibots, this will seem pathetically simple, and it is. However, for wiki admins without such experience, knowing how to protect against vandalbots is likely to be a complete unknown.

My first thought was that at its simplest, a vandalbot must take a list of pages, and edit them without human interaction, much like any editing bot, except this one will hinder, not help the wiki. With that in mind, I wrote my 'draft' vandalbot code.

site = new Site("http://en.wikipedia.org/wiki/", "botname", "botpass); //Define the site to edit, and the bot's username and password
PageList pl = new PageList(site);
pl.FillFromAllPages("a", 0, true, 100); //Get the first 100 pages starting with 'a'
foreach (Page page in pl) //Iterate through the pages in the list of 100
{
page.Save("Cheese!", "Eek!"); //Save each page replacing all content with 'Cheese!', with the edit summary 'Eek!'
}

This (when wrapped in an actual program) worked, and my tests (on a test wiki, 'experienced mad programmer on a closed wiki') showed that the bot could reach 12-14 epm quite easily. With refinement, the code could probably be made to reach 20+ epm.

Something more destructive than simply editing pages is creating new ones. Admin tools are required to delete pages, and this will lengthen cleanup time and increase disruption. This requires the pl.fill... line to be changed to
pl.FillFromFile(@"P:\ath\to\a\text\file");

The file should contain page titles, one on each line. The program will create each page in turn, with the same text and edit summary as before.

The final form of mass-vandalism, page moves, can also be executed by this program, by changing the page.save... line to
page.RenameTo((page.title + " a suffix"), "Move summary");

I tested all of these bots on the test wiki, and let me assure you, in the hands of even a programming-dunce vandal, they could cause mass damage to wikis. What's even more worrying, is if several vandals all launched an attack at once. The editing routine could obliterate a small wiki within a couple of hours, and a co-ordinated attack could do the same to even a larger site:
Small scale:
15epm with 10 vandals starting at different offsets = 150epm
150epm on a 10,000 page wiki (e.g. medium sized Wikia): ~67 minutes to vandalise every .page, assuming no reversions

Large scale:
15epm with 500 vandals (not unrealistic, think 4chan) = 7500epm(!)
7500epm on a 1,000,000 page wiki (largish Wikia, Wikipedia): ~134 minutes to vandalise every page, assuming no reversions.

No small mess for wiki admins to clear up, especially without a bot or any reversion scripts.

EDIT: The section about protecting against vandalbots has been split off into its own post here. After further research, I felt that it deserved more in-depth coverage.

Whew... long post. Hopefully this has given you an insight into wiki (vandal) bots, and how to protect a wiki from the robot hordes! It was certainly an interesting experiment for me (I ended up adding a GUI, options to select which mode the thing operates in, textboxes for the page text and summary, and a log window, because I'm like that). Just keep in mind, I don't condone wiki vandalism, and I'm not responsible for anything that you may do after reading this article i.e. creating an army of vandalbots, getting blocked from every wiki from here to Timbuktu, falling into a singularity, etc.