How Google’s Outage Affected Calvin Last Night and What I’m Now Rethinking
Our NestCams are constantly on alert and I get periodic notifications when Calvin moves into different zones. After an hour of eerie silence, opened the Nest app to be greeted by the message below, which turned out to be a 4-hour outage.
What do the NestCams actually watch?
Now these are just cameras, right? Just walk downstairs on check on him, responsible dad. It’s not that damn difficult. Well, it’s partially true that they are “just cameras.” They also:
- Alert if Calvin is going to the bathroom
- Alert if Calvin is sleeping on the floor
- Alert if Calvin is sleeping on the couch
- Alert if Calvin goes into his bedroom during the day (where he could go to the bathroom)
Now the sleeping issues could have an impact in that he’ll be up all night, but more important are the bathroom issues. And if Calvin was in a water-playing mode, this could have been disastrous.
What the lesson here?
Avoid using any cloud services? No, not at all. Would you cancel Netflix if they had a 4-hour outage? No, that’s ridiculous. The cloud has limitless power and benefit, but we have to build around the frailties of software, infrastructure and human decision. We’re light years past the question, “Is it worth moving to the cloud?”
Safety alerts should fire regardless of cloud connectivity.
Notice how I do not say, don’t rely on the cloud. The cloud is completely reliable, but life and technology happen. We have to make sure that there’s always a way for safety alerts to send regardless of whether the system is connected to the Internet.
We need to simulate outages regularly.
This is a common practice for the business world, typically called game days. On a game day, we would actually unplug the internet and see how the system can handle it. If the Internet is fucked, we can’t be.
“Outage” can also mean “power outage.”
Sure the Internet/my wireless router can go down, but the power grid is not infallible. A really important question is: how do we handle a complete technology failure? Better yet, how can we know immediately that there is a complete technology failure? We need to know as quickly as possible if shit is down.
The system needs to be more self-aware.
Care technology should care if it can’t work. And it needs to be able to reach out if it can’t. It can’t just sit there with a thumb up its ass watching
Everyone needs to know what the hell to do in case of emergency.
As many battery and cellular service backups that we put in place, in reality, some things may just not work if the Internet/power is out. So there needs to be clear instructions for everyone involved with care on how to handle that situation.
I’m staring down a hell of a lot of work, but it’s going to be worth it. Fires, massive storms, blackouts. A lotta scary shit can happen, but the technology pieces are there to put the right puzzle together.