Dead Links

In 1989 Sir Timothy John "Tim" Berners-Lee devised a wonderful way for electronic documents to reference each other by way of the hyperlink. This simple little idea allowed one document to explicitly reference another located somewhere else, and provided a way for people to follow the link and access that document. Now, nearly a quarter century later, the href plays such an important role in everything online that it's very rare that we stumble across a page that doesn't have one. It's that important to us. But there's a problem … links, like humans, are mortal.

torn-out-book-pages

A cursory check of 500 links1 in blog posts on this site written between 2007 and 2009 show a ridiculous number of problems. Invalid domains, redirects to a domain-squatting pages2, and 404s due to deleted content are just some of the issues. The whole point of a hyperlink is that content is expected to exist when we click the linked text. If it doesn't, then the reader is left without context.

Screen Shot 2013-04-08 at 1.15.06 PM

Where this really starts to create problems is in the plethora of link-blogs that are trying to mimic John Gruber's success with Daring Fireball. We can see examples like the image above all over the web. A single sentence under a link that takes us elsewhere. The link on Mr. Gruber's site will take us to SplatF … but how long will SplatF be around to host that content? Will somebody in 20 years try to do some digging on people's responses to Microsoft's recent endeavours, come across this Daring Fireball post, and see "Cannot Find Server" when trying to read the SplatF content?

Perhaps. We'll probably never know, though.

One of the greatest misperceptions people have of the Internet is that "once something is online, it's there forever". Clearly this is not the case. Instead people should say something more accurate like "once you make an ass of yourself online, the story will follow you to the grave". It's clear to anybody who has used the web for more than 5 years that things do, indeed, disappear. This can happen for any number of reasons, but it begs the question of whether we should be using hyperlinks at all.

Slow Down There, Speedy

I'm not suggesting everybody gives up using hyperlinks or only links to content they themselves control. Doing this would undermine one of the greatest benefits links offer: the easy dissemination of current information. But there's the rub. Links are really only useful for things happening right now. For historical purposes, links may become little more than the digital equivalent of a dead end. There has to be a better solution.

One such solution would be for content management systems to go the extra mile and locally store the data they are linking to. Links on the pages would first go through the CMS and, if the URL is found to be dead, the cached content would be displayed in place of the live version. This way any context would be better preserved for future readers. One of the other advantages that would come from this sort of solution is the fact that the web would become far more resilient to censorship and time itself, as archives of the most important content would exist in various locations all over the planet.

Unfortunately the problems that arise with this sort of solution are quite severe, though. There would be issues with copyright, ownership, fair use, and a host of other legal problems that would get in the way of making this a tenable option. Websites might no longer have the option to put content behind a paywall after a certain amount of time has passed, as there could be millions of copies all over the web. People who host their own websites would need to ensure their sites have hundreds of gigabytes of storage available if there are links to videos and other media-heavy content. Website providers such as Blogger and SquareSpace would likely never be able to offer this sort of utility for all its complexity3.

Yet what's the solution?

I don't have any good answers for this problem, but it is something that has created problems for people all over the world as websites undergo changes or disappear altogether. There are some web archiving services out there, but none of them are really capable of handling all the content that gets created on a daily basis. It's just not realistic to expect them to hold onto this stuff. What's to guarantee these services will be around in 25 years, anyway? No … if we are to answer this problem, I believe it needs to be done on a site-by-site basis. The best option, albeit not very realistic, would be for a site that remains online indefinitely. What the web needs right now is a better 'Plan B'.