Perspectives of a blackout
At 12.15pm EDT on August 14 2003, a worker at the Indiana-based Midwest Independent Transmission System Operator (MISO) fixes incorrect data on a ‘state estimator’ – a tool used to monitor power flow in the region. But critically he or she fails to restart the monitoring tool, sparking a sequence of events that leads to a dramatically severe blackout that affects 55 million people in the US and Canada.
700 miles east in New York City, fridge-freezers are unable to keep beer and ice cream at sustainable temperatures, leading to spontaneous block parties thrown on the streets, as recalled by Eugene Hütz in Oh No by Gogol Bordello, US100’s Track 9.
I wanted to contrast the two perspectives so I spoke with Ethel Bessem, a New York City local who was living in Queens at the time (and for podcast listeners, our very own US100 Fact Checker), and Ranjit Chagar, a controls system engineer for 30 years with significant experience of the monitoring of complex systems, who was able to offer insight into that engineer’s very own ‘oh no’ moment.
Jarek Zaba: Ranjit, I tried to read the Wikipedia page about the blackout but I really don’t understand any of it. Can you explain how these things work in really layman’s terms?
Ranjit Chagar: Sure thing. Power transmission is an interesting engineering challenge. You have to ensure that the power people want is catered for, whether one off events (e.g. the Superbowl) or predictable every day events – lights being turned on as day goes to night, heating and cooling demands as temperatures change etc.
If you fail to control the power network a number of interesting things start to happen. 1) Overloading parts of the grid can lead to the cables you see between pylons quite literally sagging – as difficult as this is to believe it does happen. When overheated, they can sag so much they can get caught on nearby objects. 2) Overloading a generating plant or its distribution equipment will result in it shutting down – with the load being put on other parts of the system. And 3) If you fail to react quickly, you will get a cascade of failures that become harder and harder to correct.
The northeast blackout of 2003 had all of these things.
JZ: So how did it all kick off?
RC: What started as a simple oversight – an operator forgot to restart the monitoring tool – cascaded into failures, as described above. Cables sagged and shorted out as they made contact with trees in Ohio, which led to demands being routed through other parts of the system. As these became overloaded, they shut down and so it went on. And it basically continued until it could not fail any more, with automatic shutdowns taking down areas of the grid, causing more load on other areas which also shut down.
JZ: So what’s the mood like over there in the Midwest?
RC: It must have been horrible for those trying to run the network. They had a system bug which meant they lost both audio and visual alerts for an hour. I’ve had that once, for ten minutes – it’s hard to describe the feeling. You spend the first few seconds staring at the screen, unable to believe that all the information you rely on is gone. Your body almost goes into shock. The fight or flight reflex isn’t valid so you’re basically left paralysed. Very very slowly a thought begins to form: “what the fuck do I do now?”.
You are running blind and and you are stuck. If you do something it could be wrong and you’ll make things worse, but if you don’t do something it’ll continue to get worse. You argue with those around you, management shout that you need to do something, anything, since things are failing. But ultimately you know there is not a single constructive thing you can do. You end up shouting at those trying to get the system back online and operational but you know that no amount of shouting can make them work faster since they are doing their best. An hour must have been an absolute eternity.
JZ: So at this stage engineers are shitting themselves in Indiana. Ethel, what’s it like down in blacked out NYC?
Ethel Bessem: That’s a stretch of days you don’t soon forget. I was 19 or 20 years old and it was the hottest and most humid few days ever – all of life was happening outside due to the heat. It was very surreal.
And I was on a mission. I had just landed a job at a country club as a waitress in Long Island [just outside of New York City] – I had just been to collect my uniform and noticed that none of the traffic lights were functioning. I wasn’t much of a radio listener, so my first thought was: ‘man, Long Island is so budget – they can’t even afford keep the traffic lights on’… It wasn’t until I got back to Queens that I realised the whole city was going through a massive power outage. (And I’ve since learned the Long Island being ‘budget’ is the furthest thing from the truth.)
Then I had to go into Manhattan with my girlfriend at the time. We had to collect her aunt from the Port Authority Bus Terminal – imagine Victoria coach station with the volume of King’s Cross but with everyone cramming the streets because of the blackout. The air con isn’t working, it’s 35 degrees and it’s humid. In the chaos of everything, it was a total needle in a haystack mission, but we found her eventually.
I returned to my mom’s where I was living, and had to eat the whole fridge as the food was going to expire in a matter of hours.
JZ: Quite the day. Ranjit, when things come back online, it’s all good right?
RC: Absolutely not. If anything it’s worse – you can see the state of the system and everyone is now shouting at you to sort it out. But cascade failures are making things worse, everything you do has a knock on effect, and you have no time to understand and plan, you just have to do something. And, by the way, no one ever planned for this to happen, so there is absolutely no procedure for how to cope. Eventually you are left just watching it all happen, with a lead brick in your stomach and the overwhelming need to pee.
And then it stops. It’s gone as far as it can and it all goes quiet. Nothing more happens. And all you can do is stare at the screen, at the extent of what has happened. And then try and work out how to recover all of it.
JZ: Is that the worst of it?
RC: Well recovery isn’t easy. You have to understand why each individual element shut down, address that problem, and then bring power back online for that area. It takes ages and ages to do this, as you have to work super carefully in case you end up with another overload condition.
And just to add insult to injury, the after effects of the power outage continue for days. Flights and transport delays can take days to clear, there are concerns around water contamination (pumped water needs to be kept moving, so if you get backflow you can’t guarantee clean water to households), so you’re constantly reminded of what happened. And, of course, there are the inquiries and the inquests – regardless of how much they tell you they just want to know the cause, you know the company is looking for someone to blame.
JZ: Hey, at least they helped enable some good parties in NYC – and inadvertently their mistake(s) led to a damn good song.
RC: Yes and one of the most amusing and intriguing aspects of the power outage was that crime went down, compared to the same period in the year before. You’d think with lights out and no alarms working people would have taken advantage [ed: as they did in the 1977 New York City blackout] but no – a party atmosphere and a sense of camaraderie prevailed.
EB: Well I didn’t have a block party. In fact it was years later that I discovered that everyone else had the most sensational party ever.
I was talking about the blackout last year with an old colleague who happens to be from Long Island. This kid had a pool party, invited all his mates over and then barbecued all the food in the house. Meanwhile I’m in Queens living in Stevie Wonder’s village ghetto land.
Still, I think there was birth rate spike the following year, so some people certainly enjoyed it.
JZ: New York seems to have a bit of history with blackouts.
EB: It still happens. It spikes up to 35 degrees in the summer, and the power infrastructure is terribly dated.
JZ: So Ranjit, who dropped the ball here?
RC: Ultimately you have to go back to the designers. Firstly there is no reason why key systems should be allowed to be left in a ‘bad’ state – either it should reset itself, or it should trigger an alarm so people know that it’s in a bad state. Secondly a system should not be allowed to go into a cascade failure. When something similar occurred in China recently the whole system shut itself down, with 3 million people plunged into darkness. But that enabled engineers to assess the issue, sort it out and restore power in a matter of minutes, whereas with a cascade failure you are just chasing your own tail.
The good news is that every time an incident like this occurs we learn something – so as scary as a complete power shutdown may have been to those operators in China, they would have known exactly what had happened and how to deal with it, partly as a result of incidents like the 2003 Northeast blackout.