Global Fire Alarm When Your Power Regulator Calls It Quits During The Holidays

What happens when a data center goes up in smokeโ€”literally? In this episode, we sift through the potential ashes of a not-so-great day. Join us as we chat about servers, batteries, paperwork and holidays, and the surprisingly flammable side of computing. It’s a smoky mess, and weโ€™re here for it.

Listen now on Apple Music, Spotify, Deezer, Youtube or where-ever you get your panic attacks.

Welcome to the official blog post for the legendary episode “Our First Datacentre Fire” from Jack Smithโ€™s IT Horror Stories podcast! If youโ€™ve ever worked in IT, spat out your coffee upon hearing โ€œthe backup failed,โ€ or been paged during New Yearโ€™s Eve, settle in: this story is your spirit animal.

From retro servers, smoky basements, and good old โ€œMacGyver fixes,โ€ come relive the drama, disasters, and some questionable change management decisions that defined IT in the early 2000s. It’s a tale packed with nostalgia, server clusters, and more than a hint of chaos.


Back to the Early 2000s

“Do you have a DeLorean?”
Sadly, no one did. But if we could rev one up to 88 mph, it would drop us straight into the world of the early 2000sโ€”a time before Facebook, Twitter, or even MySpace.

“Back then, the Internet was still a nice place to be. No bubble, no algorithms, just raw forums and Ask Jeeves at your service.”

“Ask Jeeves, you say?”
Why not? If you could have a butler who also searched the web, wouldn’t you?

The story takes us back to that special age of chunky Nokia phones (with batteries that lasted two weeks), dial-up modems, and server rooms humming with Windows 2000 clusters.


The Setting: A Global Logistics Company

Picture this: You work IT at a massive logistics company. Think trucks, planes, trains, and a network that canโ€™t stop. The company runs 24/7, 364 days a year. And the one dayโ€”New Yearโ€™s Eveโ€”where virtually everything comes to a halt, youโ€™re on call.

This is the era when offsite backup was swapping tapes, cloud storage was a pipe dream, and network failover meant running actual fiber between concrete buildings across the business park.

“If youโ€™ve ever plugged your laptop into a Nokia and dialed in to troubleshoot, congratulations, youโ€™re officially ‘old school IT.’”

The On-Call Drill

Let’s set the stage:

  • On-call system: Check
  • Laptops without WiFi: Check
  • A cell phone dial-in (Nokiaโ€™s, obviously): Check
  • Company on full shutdown for just this one day: Check
  • Management says, โ€œWhat could go wrong?โ€: Check

The perfect recipe.


Disaster Strikes: The Call No One Wants

Itโ€™s New Year’s Eve, and the party is in full swing. Then, the on-call phone rings.

“I am holding my glass of champagne, I amโ€ฆ holding my dessert, and I get a phone call from the contact person: ‘Hi, yeah, we have a fire.’”

Of course, itโ€™s New Yearโ€™s. Of course itโ€™s a practical joke.
Exceptโ€ฆ it isnโ€™t.

The Spark

Back at HQ, the security team spots an entire building drop offlineโ€”no sensors, no fire alarm, nothing. Because, well, everything was dead: “The system wants a day off, too,” they joke.

Half an hour later, someone checks, wanders in, and is greeted by a room full of smoke. No beeping, no blaring sirensโ€”just ominous, silent smoke.

“Early fireworks? No, but no fire alarm. Because the building was offline.”


Tech Failsafe: How Clusters Saved the Day

Panic at the Cluster Console

As the fire department does its thing, itโ€™s up to IT to check the true heart of operations: the server room.

The first step? Plug that Nokia into the laptop and remote in to the fabled Windows 2000 Advanced Server clustersโ€”yes, with the legendary blue-and-white admin consoles.

  • Clustered file servers? Check.
  • Clustered print servers? Check.
  • Databases and applications, all split between multiple buildings? DOUBLE CHECK.

Despite panic on the scene, the clusters have failed over like champs. No “split-brain,” no servers thinking they’re the only one left. Just a clean, almost-pristine cluster failover.

“I see the server room in that building is offline. Luckily, all these servers are in a panic mode, but they failed over to the active building.”

Let the Calls Begin

Jackโ€”the IT sanity anchorโ€”starts phoning his fellow team members. 90% are deep into the New Yearโ€™s party, so the call chain is full of, โ€œHappy New Year!โ€โ€ฆ โ€œNo, seriously, get sober, we have a fire.โ€

Preparation point? Make sure the crew isnโ€™t fully in cocktail mode. “Good move,” as Bob notes.


Recovery Mode: MacGyvering Our Way to Uptime

Arriving at Ground Zero

Jack makes his way to the office.
His philosophy: “If I turn the corner and see flames shooting out of the building, Iโ€™m heading home.”

Instead, heโ€™s met by a Darth Vader-style scene: the night shift supervisor, mask on, rising from the smoky basement.

“From the smoke, like in a Darth Vader style, the weekend shift supervisor from back then came up with a gas mask on.”

Why Did Everything Break?

The culprit: the buildingโ€™s no-break system (basically a fancy UPS) failed spectacularly, cutting power not just to servers, but to emergency lighting, the alarm system, and crucially, the fan ventilating the no-break room itself.

  • No power to fire alarm: No heads-up to anyone
  • No fan: No oxygen, so the fire couldnโ€™t rage

In a cosmic twist, this lack of oxygen was a lucky breakโ€”it suffocated the fire before it could hit the enormous paper archive next door.

“Hereโ€™s where we got lucky: the fire was starved of oxygen because the fan died, too.”

Cluster Heroics and the โ€œWooden Board Solutionโ€

With the all-clear, the MacGyver-ing begins. An impromptu wooden board, crammed with power plugs, gets the critical systems live again. Itโ€™s January 1st, 5:30amโ€ฆ and the servers power up.

  • Clusters reactivated โœ”๏ธ
  • Business impact minimized โœ”๏ธ
  • Management still homeโ€ฆ for now โœ”๏ธ

“On January 1, 5:30am, I was bringing up the systemโ€ฆ MacGyver style!”


Company Culture: Change Management?

If you expect slick change management, detailed documentation, and carefully rehearsed disaster recovery runbooksโ€ฆ well, welcome to IT in the 2000s.

“Somebody came up to me: ‘Hey, we should do change management.’ My response? ‘Hell no. All this paperworkโ€”we know what weโ€™re doing. We got this.’”

No runbooks, no lessons-learned sessions. Just handwritten notes, tribal knowledge, andโ€”the big oneโ€”having literally everyone who ever matters physically onsite and sober “just in case.”

The Honor of Being There

The upside of New Yearโ€™s Eve? Every single chief was on call: electrics, plumbing, logistics, and IT. If you ever want to schedule a disaster, this was the golden window.


The “Lessons” Learned

How do you learn from a datacentre fire? Itโ€™s complicated:

  • The company didnโ€™t lose data.
  • They didnโ€™t lose infrastructure.
  • The only thing lost wasโ€ฆ sleep and a bit more gray hair for IT.

Was anything changed to stop a future fire?
Separate backup power for alarms? Not really.
Dedicated circuits for safety? Probably not.

“I donโ€™t even think it was lessons learned because, hey, the system broke, we handled it, there was no business loss, so everyoneโ€™s happy.”


The Aftermath and Reflections

Could It Have Been Worse?

Yes. Much. The paper archive couldโ€™ve gone up. If the fire had better oxygen, or the failover hadnโ€™t worked, it would have been catastrophicโ€”not just for the company, but for the surrounding neighborhood.

Everyone remembers the post-mortems, right? Well, not really:

  • No evidence the building burned down later (โ€œJust turned into a parking lotโ€).
  • Most equipment eventually got scrapped as the company moved on.
  • Years later, they still struggled to implement process and change management.

“They advanced to the phase where they could say they had change managementโ€ฆ but the process was just for KPIs and not for actually doing stuff.”

Tales from the Operations Trenches

  • Senior managers using their mail server to store a 60GB movie collection
  • Telnetting into mainframes trying to find Shift+F13
  • Servers living dangerously above unsuspecting accounting staff

“Did they ever figure out why the no-break failed?”
Nope. Official cause? โ€œIt just went poof.โ€


Classic Quotes

Letโ€™s relive some of the iconic lines from this story:

โ€œIf I turn the corner and see flames coming out of the roof, Iโ€™m turning around.โ€

โ€”Jack, keeping his New Year’s priorities straight

“We ordered an entire storage rack for the first floor. The floor couldn’t hold it, so we put some metal beams underneath to spread the weight. Problem solved.”

โ€”Classic 2000s IT risk management

“There are no procedures for that.”

โ€”The unofficial company motto

“I never got any reports. Hallway talk was: it just went poof.”

โ€”When documentation fails, gossip wins


Conclusion

So what do you get when you mix early-2000s infrastructure, New Yearโ€™s Eve, and perfect cosmic timing? The almost disaster that was their first datacentre fire.

The takeaways:

  • Set up stretched clusters, and they might just save you.
  • Sometimes, having the dream team available is pure luck.
  • When the no-break goes, cross your fingers the fan dies, too.
  • Change management is still a myth at most companies.

Appendix: The 2000s Survival IT Checklist

Want to know if youโ€™re prepared for a classic disaster?

  • [x] Nokia phone (charged for 2 weeks)
  • [x] Laptop with a dial-up modem
  • [x] Windows 2000 Advanced Server Cluster
  • [x] Knowledge of the building layout (and escape routes!)
  • [x] A wooden board for emergency โ€œinnovationsโ€
  • [x] Coffee. Lots of coffee.


Final Thoughts

No goats were sacrificed, but plenty of servers were scared.
The more things change (cloud!), the more the old tales remain hilarious, stressful, and deeply relatableโ€”for anyone whoโ€™s ever been the only sober sysadmin at the party.


Leave a Reply

Your email address will not be published. Required fields are marked *