Link Rot

Books go out of print, although you can usually find them in some library or used book store if you are desperate enough. Soon more publishers will be offering print on demand for rare and out of print books, which is great. And when books go in the public domain you can sometimes find them on Project Gutenberg. Music CDs go out of print, as do DVDs. That’s even harder to deal with, although there is a big second-hand market online you can explore.

But what to do when a URL goes dead? If it is recent you might be able to simply find it in the cached search results for Google or Yahoo!, but after a while I’ve found those caches are updated to show the error message on those pages. Well, there is the Internet Archive, which has the WayBackMachine, but that is not very good either. It sometimes works, but many sites block web engine robots from crawling their site. Other sites are simply difficult for the WayBackMachine to crawl, so you can find the front page, but then the links are all dead.

This is a serious problem for scholars and teachers. A study done in 2002 found that links for some courses in biochemistry decayed at the same rate as radioactive isotopes:

The links in the three courses had a half-life of 55 months: Half of the links would be expected to have died in 55 months, half of the remaining links would be expected to have died in another 55 months, and so forth.

I don’t know if things have improved any.

I am sad to admit that I am personally a source of link rot. (Maybe there is some kind of cream I should be using?) Having recently moved my entire web site over to TextDrive, I ran into numerous problems leading to link rot.

The first involved the various different wiki software I was using. My main wiki uses MediaWiki as the backend, and in moving to the new server I did a fresh install and updated everything to comply with certain file naming standards I had foolishly ignored the first time I installed the software. As a result, all the old links are now dead! I could probably go through and redirect all the old links, one by one, to their new site, but I took the easy way out and created an error page that will appear to anyone who tries to use the old links, telling them how to fix them.

But that was not the only source of link rot. There were two software packages running old portions of my site which I decided I wouldn’t move to the new site because it was too much work. Instead I removed them. Now they are gone. I thought I could solve the problem by linking to the Internet Archive version of those sites, but I found that this didn’t work since the archive had not properly stored the whole site. So now, those pages are simply lost in the ether.

Also, all my URLs changed a few years back when I moved my blog from MovableType to WordPress. At the time somebody helped me write a redirect script to point all the URLs at the new site, but with the move that is lost as well, and I lack the time and skills to recreate the script.

I did some searches, and found that there aren’t that many links to my older stuff. It is only in the last couple of years that my site began attracting much attention. So I’m not really that worried about all those dead URLs. Although it is frustrating that some Google searches to my older stuff doesn’t work. Conversely, I have no idea how many of the sites I linked to over the years are still around. I’m not sure I want to find out.

It is a good argument for wikis, since anyone who finds a dead link can update it. Anyone can self-publish on the web, which is great, but it also means that everyone is personally responsible for preventing link rot on their own site. As I just discovered, that isn’t so easy.

6 thoughts on “Link Rot

  1. It’s a problem I’m in the middle of dealing with, as a DOS/flooding attack has knocked my personal site, One Man’s Opinion, of the net — nearly 5 gigs of traffic a day is not pleasing my host! So they yanked the site and (apparently) deleted all the files (don’t use Maxipoint hosting serivce, folks!). I moved the domain to my other webhost, and uploaded my most recent good backup, which is nearly a year old (I had a hard drive failure in June, and my backups for the beginning of the year have been lost; I was updating so infrequently over the summer that I didn’t bother backing up in July and August). The attacks followed, so I’ve pulled the domain completely and am considering starting over with a new domain name. While the content is mostly intact (thanks not to the Wayback machine, which coincidentally seems to have stopped indexing my site at the same time as my most recent good backup — go figure! — but via Google’s cache) they will no longer live at wherever people have linked to them, and if the domain name is no longer good, there’s no way to redirect traffic automatically. Such is the fate of the hyperlinked web — in one fell swoop, some a-hole has doomed not me but anyone who has linked to me in the past to link-rot.

  2. I saw something about this in your RSS feed, but I didn’t realize how bad it was. Sheesh. I think Spam Karma which we use to protect this site has anti-flooding protection of some kind, but I’m not exactly sure how it works. I think if you try to post too many comments in succession you are automatically banned and blocked from the site. Although I doubt there is much that can be done if you have a determined hacker who has it out for you.

    How they handle such attacks is one of the things I asked about at TextDrive before signing up, and I was fairly pleased at their response, which was that they would work with me rather than treating me like a criminal if something like this happened.

  3. Yeah, when I re-uploaded the old install, it sent last November’s posts out through the RSS feed. I’m hoping that my new host, which is a business-oriented host that I’ve been using for other sites for almost 2 years, is going to lend me a hand with this. They were already nice enough to let me know that traffic had spiked way up — before they started charging me for bandwidth. The other host (Maxipoint, folks — don’t use ’em!) just yanked the site and didn’t let me know why until I opened a tech support ticket to find out why my site was down. Ironically, I had installed a blacklist program a couple months ago, and for the first time saw a marked decrease in comment and referrer spam — now I’m wondering if that didn’t only make some spammer all the more determined? I can’t imagine my 50 or so visitors a day would be that important to a spammer, though — so I’m thinking I offended someone, which is a good feeling, in a way, though not a fair exchange for losing the site completely.

  4. Oh…yes this is really annoying. That’s the problem with dynamically generated pages. You have less control with them. I’ve linked to your wiki pages many times, I’ll have to go through the blog and update them. At least you have set up this error message. Most site owners don’t care (those of them who haven’t found out yet what the internet is about – it doesn’t consist of single isolated webpages but of a network of webpages linking to each other).

  5. I’m actually thinking that there are only a couple of articles which account for most of the links to my wiki, so I might be able to do a special redirect for each one of them. Don’t change your links just yet!

Comments are closed.