Monday 17 August 2009

On programming, wikis and a proof-of-concept vandalbot

Disclaimer: I am not to be held responsible for anything you may do with the code samples contained in this post. My 'vandal' bot is a proof-of-concept only, and I am giving an insight into its creation as an intellectual/programming/wiki-management exercise only.

Skip to the bit marked 'Programming ahead' if you already know/don't care about Wikipedia and the background to all this.

Before we get into the business of programming bots of wiki destruction, a little background. I used to be quite an avid Wikipedia editor/bot coder, up until early this year when the immense levels of bureaucracy and drama became too much to cope with. For those unaccustomed to Wikipedia, and therefore do not have the faintest idea what I am on about (few people do anyway), here's an analogy. Think of a group of teenage girls crossed with a government department, and that's a pretty good idea of what we're talking about. Don't get me wrong, WP is an amazing resource, but surely you can write an encyclopedia without having to have a week-long process for even the simplest thing?

Anyway, back to the main event. Wikipedia - and all wikis for that matter - have a persistent problem with vandalism. Usually this consists of puerile comments added into articles (usually high-profile ones), page blanking or link spam. However there are a small group of extremely determined vandals (especially on high-profile, high-traffic sites) who edit vast numbers of pages in a very short time, often replacing their content with shock-site images or mindless crap designed to crash browsers (huge images, malformed HTML, etc). Another breed moves (renames) pages en-mass to names such as ON WHEELS, or variations on HAGGER???? (the favourites of vandals known as 'Willy on Wheels' and 'Grawp' respectively).

One day about a month ago when I came across some of this 'vandalism-on-steroids', I wondered: how do they do it? And thus the vandalbot experiment began. I figured that they had to be using some kind of editing bot to do the job, based on how many edits per minute they were making. 30-40 epm is far too high a rate for even the most determined idiot. So, using my (then rather rusty) knowledge of bots, I set out to reverse-engineer (in a very loose sense) a method for mass-vandalism.

Warning: Programming ahead!

For this project, I used Visual C# Express 2008 and the dotnetwikibot API, both are free and easy to use.

I'm not going to go into the intricacies of using VC#, or programming using dotnetwikibot. There are plenty of tutorials and examples out there, and I am not writing a 'how to make a awsum vandalbot to pwn wikipedia!!!!111' for all the script kiddies out there. This is about finding out how they can be used to the detriment of wikis, and how to protect against similar bots. To those with experience with wikibots, this will seem pathetically simple, and it is. However, for wiki admins without such experience, knowing how to protect against vandalbots is likely to be a complete unknown.

My first thought was that at its simplest, a vandalbot must take a list of pages, and edit them without human interaction, much like any editing bot, except this one will hinder, not help the wiki. With that in mind, I wrote my 'draft' vandalbot code.

site = new Site("http://en.wikipedia.org/wiki/", "botname", "botpass); //Define the site to edit, and the bot's username and password
PageList pl = new PageList(site);
pl.FillFromAllPages("a", 0, true, 100); //Get the first 100 pages starting with 'a'
foreach (Page page in pl) //Iterate through the pages in the list of 100
{
page.Save("Cheese!", "Eek!"); //Save each page replacing all content with 'Cheese!', with the edit summary 'Eek!'
}

This (when wrapped in an actual program) worked, and my tests (on a test wiki, 'experienced mad programmer on a closed wiki') showed that the bot could reach 12-14 epm quite easily. With refinement, the code could probably be made to reach 20+ epm.

Something more destructive than simply editing pages is creating new ones. Admin tools are required to delete pages, and this will lengthen cleanup time and increase disruption. This requires the pl.fill... line to be changed to
pl.FillFromFile(@"P:\ath\to\a\text\file");

The file should contain page titles, one on each line. The program will create each page in turn, with the same text and edit summary as before.

The final form of mass-vandalism, page moves, can also be executed by this program, by changing the page.save... line to
page.RenameTo((page.title + " a suffix"), "Move summary");

I tested all of these bots on the test wiki, and let me assure you, in the hands of even a programming-dunce vandal, they could cause mass damage to wikis. What's even more worrying, is if several vandals all launched an attack at once. The editing routine could obliterate a small wiki within a couple of hours, and a co-ordinated attack could do the same to even a larger site:
Small scale:
15epm with 10 vandals starting at different offsets = 150epm
150epm on a 10,000 page wiki (e.g. medium sized Wikia): ~67 minutes to vandalise every .page, assuming no reversions

Large scale:
15epm with 500 vandals (not unrealistic, think 4chan) = 7500epm(!)
7500epm on a 1,000,000 page wiki (largish Wikia, Wikipedia): ~134 minutes to vandalise every page, assuming no reversions.

No small mess for wiki admins to clear up, especially without a bot or any reversion scripts.

EDIT: The section about protecting against vandalbots has been split off into its own post here. After further research, I felt that it deserved more in-depth coverage.

Whew... long post. Hopefully this has given you an insight into wiki (vandal) bots, and how to protect a wiki from the robot hordes! It was certainly an interesting experiment for me (I ended up adding a GUI, options to select which mode the thing operates in, textboxes for the page text and summary, and a log window, because I'm like that). Just keep in mind, I don't condone wiki vandalism, and I'm not responsible for anything that you may do after reading this article i.e. creating an army of vandalbots, getting blocked from every wiki from here to Timbuktu, falling into a singularity, etc.

0 comments:

Post a Comment