For a little over half an hour yesterday, just about every website that I am responsible for keeping online fell offline. The first indication that there was a problem was a series of messages on Nice.Social. Then I tried a few other 10C-based sites and couldn't load anything reliably. Having seen this a couple of times before, I thought that my ISP had once again changed my static IP assignment1. Testing the theory proved this idea wrong, though, so I tried other remedies. Rebooting the router did not fix the problem. Rebooting the server did not fix the problem. Then I decided to check CloudFlare and saw the very same messages afflicting their site that afflicted mine: 502 Bad Gateway.
Turns out that CloudFlare, a service that I rely on quite a bit, was taken offline by a software update gone bad.
In the first couple of hours after service was restored, when geeks could get back to being angry on forums, a number of people said that they have "had enough of CloudFlare" and will take their business elsewhere. Others lamented that the Internet had become "too brittle" as a result of people being so dependent on a handful of American companies for critical services. Some even said things so absurd that it makes no sense to repeat them, but feel free to explore some of what people on TheRegister had to say.
As a paying customer of CloudFlare, I'll admit that there was quite a bit of frustration when none of my sites — inside or outside my home — or services were responding to web traffic but would handle SSH just fine. That said, "things happen" and systems became unavailable for anywhere between 27 and 50-odd minutes2. My SLAs with clients need to be carried out and, in all, this service outage will cost me about $40 in credits that I'm applying to accounts. Not a huge amount, but not something I would like to hand out daily. Will I be moving off their service to use someone else? Not a chance.
While it's unfortunate that there was frustration and downtime, there's really nothing better or easier that I'm aware of. A number of my APIs have integrations with the CloudFlare API, making it possible to programmatically create, update, and deactivate DNS records, SSL certificates, trigger DDoS protection, and more. More than this, the people at CloudFlare have been pretty open about the problem and work hard to have one of the most reliable systems on the planet. Jumping ship because of an occasional hiccup like this would be premature. If it becomes a monthly or weekly pattern, then I'll be more incentivized to find or create an alternative. Currently this simply isn't the case.
I trust that CloudFlare will learn from the mistake and ensure it doesn't (easily) happen again in the future.
Yes, I know. I've had the same conversation with their tech support on numerous occasions. It still seems to be that every 45 to 60 days I'm given a drastically different IP address.
Based on server access logs here at home as well on Amazon's EC2 instances.