Over the years, when someone crashes something at work I (semi-jokingly) say:
If you don't crash something every so often, you aren't working har...
For further actions, you may consider blocking this person and/or reporting abuse
In my early days, I learned (...eventually...) that complicated SQL queries with incomplete JOIN conditions on large data sets in production = non-responsive server = angry customers.
Fast forward to today, I insist that the IT department provide a dev/staging server which mimics production knowing full well that a runaway query (or never-ending loop, or recursive function with a misplaced terminal condition, or ...) can happen to anyone from time to time. Even the best of us! :)
In the early 70s, I thought I was a hotshot programmer with 5 years of experience, I could do Assembly and Cobol. I did an optimization that impressed my supervisor, got permission to examine the computer. It was an IBM 370/148, not the small one fit in a large room partitioned into 4 but a big one, took up the entire floor at the bank. partitioned into 8. This was my first time on a big machine. I examine F0, F1, F2, F4, F5, F6, then paused. power was always on the last part F3, but this big machine power must be on F7. You never look at power, the machine would have scan every log file since it was built to fill the screen. So I go to F3, this big machine was actually a small machine that had been upgraded. First all the line printers quit, then all the tape drives quit, then nothing but the rattle of the giant HDD platters, then the phone rang, then every phone at the bank was ringing. 165 seconds after I pressed the wrong button, the screen filled up and everything went back to normal. I got to meet the bank president, the entire board, every manager, and the Federal Reserve guy. I said I was trying to speed up the computer. Seems every terminal in every branch also went down for 165 seconds. Federal Reserve banned me from any bank computer room, they took away my keyboard and terminal. I had to program with a plastic template and graph paper, get someone to keypunch it for me and then bring the stack of cards to the computer operated to be run after hours. making a change in the program and compiling it and seeing the results was about 60 hours. I would sweet talk the computer operated into fixing my code. I ended married to her, for years she kept fixing my code.
Ha, sounds worth it!
This is fantastic! I was not expecting a ❤ story. Thanks for sharing.
My body
Yeah...about 5 minutes before a demo at my company, I thought I could fix a simple SQL call that was causing so invalid data. It worked on my machine fine. But when I went to push it to the server, something went wrong and it broke the demo. The code was fine but the deployment failed. Took 15 minutes into the meeting to figure out what went wrong and fixed it. Whoops. haha.
I know that temptation, trying to squeeze in as much as possible into the demo.
I one time took down a Facebook game I worked on with over 4 million daily players. The combination of an off-by-one error and infinite retry with no back off turned all the game clients into a DDoS attack on our own servers. Even the Akamai web console was unresponsive for about 10 minutes.
Seeing as my primary area of coding is vulnerability scan automation I tend to crash a lot of things. Central web services, DNS servers, Document Repo's, Data taking systems. I've killed them all. Each time I do it I have to spend a lot of time trying to figure out how to be kinder and more gentle to the systems on the network. My Favorite is scanning for anonymously writable FTP daemons. I get a lot of complaints from users that their printers are spitting out lovely notes from me.
We've got an old records system on an AS/400 that needed the data migrated. Short of writing a fully custom app to allow editing (due to the legal need to provide redaction capability forever), we needed some kind of front end. We have a robust document management system that provides all of the retention, security, search, redaction, and managed access features that we need. We looked at a ton of options, and this was the best fit.
So, I set about turning the data into a series of reports and feeding them into the system using their SDK. The first iteration just about killed the report server. At that point I learned how to generate reports locally on my batch machine, and how to do it in parallel. The new process resulted in the queues on the server being filled up, and the resources being saturated. Everything ground to a halt. My next lesson was in tuning the import process to not be so hard on the servers. The next hurdle was, in importing a couple of hundred million files, the SAN hosting the application's storage ran out of inodes. Every server hitting that storage location had a fit. There wasn't a lot I could do about that, but after the operations team cleared the issue, add that to the list of things I know to think about for the future.
I learned about a lot of neat things during that project. It gave me a ton of great lessons that I wouldn't have learned if stuff hadn't have broken.
Brought down a couple warehouses for companies you've definitely heard of because our deployment strategy was to TeamViewer to customer site > Drop patched dlls. Not much QA there.
I am guilty of the old Update without a "Where" statement on a accounting production database (there was no staging environment at all).
And working previously on IT (more hardware stuff), I plugged a computer to the network, and somehow it caused an IP conflict between the computer and a software license server, which crashed the software on every computer in the network. That one caused lots of mayhem all through the building.
I saw this earlier and thought I was just going to read along and nod to some relatable content. Then visual studio hiccuped and I watched the entire bin folder start throwing errors because vs thought half the files were deleted. Somehow the project builds without them but it won't publish and I wasted enough time trying to fix it that I could have recreated the whole project
I was trying top delete a set of scripts in a subdirectory and typing faster than I thought led me to delete all of the scripts in our build server. I stood up immediately after realizing and walked over to the most senior developer on our team to tell him, and he said that it was no big deal. It was rsync'd to the secondary server and he could bring the scripts back with a vintage of just last night.
Many 💥, way too often.
Working directly in production databases is rare (and still a terrible idea) these days, but it used to be status quo.
Let's just say that I still tend to write my "WHERE" clauses first when writing SQL.
:)
I was fixing a friends laptop and bricked it... twice. I have the same thing to my computer (with my computer the power went out during an update only for a second but it still bricked it, it went out again when i was reinstalling linux).
I once messed up the key used for a particular cache. The system worked fine with 1-2 users testing it. The first morning after going to production everything went haywire. The users were getting error messages on every screen, regardless of whether they were using the affected feature or not. Productivity ground to a halt. That was Friday. I was out of town Thursday through the following Wednesday.
I learned myself Entity Framework half a year ago because I really got tired of the way we did queries in my workplace's main product. Seems easy enough, but... Lots of fun when a seemingly innocent LINQ query refuses to do a simple join and instead fire of one query for every related item. And it doesn't slow the application down enough to be noticeable. Well, until our main products peak time in the week.. Server memory usage:📈 Our app:💥 Me that day:😓
But the occasional EF weird-stuff beats having to deal with stored procedures and stringbuilder-made query strings.
Filling the disk! Logging to a file in a tight loop will generate more output than you might think. I once logged to my home directory which was network mounted, next thing you know the server was offline and no one could login in. Doh!
One of my early PRs at a shop doing mostly zero-downtime schema migrations turned out to... not be zero-downtime. Unexpectedly so. We got the first user email within seconds. Good times.
Once, while working a sysadmin for my family's business, I wiped an important application server and re-installed the OS only to find out that I couldn't restore the db from a backup tape. I spent all day and night in the server room, freaking out, on long phone calls with Veritas and MS, while my uncle had to do the software's job on paper, without the benefit of the now-missing historical data.
Phone support was no help, but I eventually realized that the restore job might be crashing because, in an unrelated move, I'd also re-organized the domain's admin accounts, and the one that made the backup wasn't the same one being used to restore. (Likely a bug -- the error message was completely unrelated the new account had sufficient privileges.)
In the end, I was thankfully able to restore everything, but I looked like an idiot.
The first time I crashed something was at my first programming job, and my boss was really cool, he too said if I didn't break something, I might not be working hard enough. Maybe he was understanding because, despite bringing down the entire SCO/Unix system, it was on the very morning of the day we had a consultant in to upgrade to a new version. And I learned a lot about Unix that day, watching over his shoulder as he brought the system back up.
Second time it was a bug in the Unix implementation's (HP-UX) version of userdel. I was the closest to a DBA/Unix expert at the hospital I worked at in a small western town. A consultant was coming up the next day from the metropolitan area 200 miles to the south to hand over the reigns to me, and she said I should create a user account for myself, and copy all her files over from her account into mine.
Problem was, in so doing, the entire / root partition got filled up. I recognized this instantly, and so ran userdel on my newly created account, with the idea that we might put my user account in a non-standard partition, or just wait until she showed up for other suggestions.
Immediately after doing this I noticed that all files no longer had any user and/or group associated with them. I stayed after hours and with the help of technical support almost completely restored everything from backup. The next day the consultant and I did a little research, and the issue with HP-UX's userdel is that it tried to create a copy of /etc/passwd in the / root partition, as a working copy to make changes to. Problem was because the partition was full, the working copy was 0 bytes in size, and then the original got over-written with this.
I tried to replicate this on my Linux box at home but it's implementation of userdel was sane and it did not fail the same way. My employer didn't give a rat's ass though and I was no longer with them after another couple of weeks.
The service that Baylor hospital systems use to estimate cost of procedures, some storage arrays, the custom hardware that my company makes (frequently, but not in the field), a motorcycle.