The perils of fancy 404 pages
October 21st, 2005I got a warning from my ISP today that I have exceeded my bandwidth limit (2GB/month). Now, even with the extra traffic the site has got recently (MediaGuardian diary sent over several thousand eager beavers to the Daily Mail-o-matic), and the new design, that’s still a lot, for only 70% of the month gone by, and more than the site has ever had to transfer in a whole month.
I ran analog on my access log files, thinking either a rogue bot or RSS reader was taking more than its fair share of bandwidth (as has happened before), but it only said 1.4GB had been used up this month. My ISP’s own log analyser gave the same figure. So where did the other 0.6GB come from?
I was baffled, until I noticed that there had been an unholy number of failed trackback attempts. I haven’t had trackbacks on this site in at least a couple of years; I found the spammers were too annoying and deleted the scripts which handled them. But the old trackback URLs (which were part of the old blogging CMS that I myself coded, before I switched to WordPress) were still getting pinged, presumably as they’re on some sort of spammer’s hotlist. Normally it’s fine, but this time, they had been pinged approximately 18,611 times over the past 21 days. And each time, they’ve been getting a 404 error. Not just a normal 404 error, but a nice pretty 404 page with soothing instructions on what might have gone wrong, incorporated within the standard blog template, which in its entirety is about 33kB of HTML alone. Multiply the two numbers together, and hey presto, we have our missing 0.6GB (or thereabouts). Since the log analysers only include successful page transfers in their bandwidth counts, it was not getting counted.
Fancy 404 errors have been turned off for now. When I get a little time, I’ll rewrite .htaccess so that normal 404s (which number in the low tens per month) get a pretty page, but requests for trackback URLs do not. It’s entirely my fault - I knew those URLs were being pinged and it was only a matter of time before it got attacked in a big way, though I never thought that this many would ever be attempted. Anyone else who has fancy 404 error pages and the same possibility of suffering ping attacks, or just wondering why analog and their ISP’s bandwidth totals don’t agree, might want to bear this in mind.







October 24th, 2005 at 08:13:52
4 of my top 5 user agents are bots; only IE wins out over them (slightly gratifying in itself as it means I’m not the most frequent visitor to my site… Fortunately my site is nowhere nearly as popular as yours and so I don’t have to worry about bandwidth restrictions, but I am certain (due, apart from anything else, from my migration from Blogger to Wordpress) that I’ll have any number of dead links &c that will screw me in the time to come.
October 24th, 2005 at 16:39:05
Don’t forget that Pop Bitch linked you on last weeks newsletter too!