Nip Tucking the Infrastructure
Beverly Hills has Dr. 90210, FX has Sean McNamara and Christian Troy, ePartment54 has me. While I'm not quite the surgeon those folks are, I do occasionally blow the dust off my handy toolkit and perform emergency computer surgery on some of my own systems, typically out of necessity, not pleasure. You've probably already heard stories of the bad luck I've had in the past with hardware, even brand new stuff. As you might guess, my network has once again gone bonkers and subsequently decided to go on a one-way trip to hardware heaven.
In this particular blog, I'm going to tell the tale of the last two weeks that cost me dozens of hours buying, installing, upgrading, and configuring both hardware and software necessary to support my bad habit of using too much electricity for moving a few bits from place to place. I will warn you ahead of time, this is going to be a long blog, so grab a beer or your beverage of choice, sit back, and enjoy the pain I've experienced due to technology failing me once again. If you have been in earshot range of me lately, you've most likely heard me complaining or otherwise talking about some of the issues I've run into, so some of this may be nothing new, but at least may have some entertainment value.
So before I give you a very long story to read, I thought I'd provide some quick pictures that show you what the old ePartment54 network looked like and what the new one, two weeks later, now looks like.
Previous Network – Nothing too spectacular here, just a bunch of parts thrown together to create something that moves bits from one place to another. The cases themselves were had poor cooling in them and were quite loud all around.
New Network – As you can see, quite an upgrade both aesthetically and mechanically! It was a long time coming, but that time finally arrived.
Let the Saga Begin…
I guess you could say that this began close to a month ago when I was sitting at working checking my home email via pine. Upon selecting one of my email messages, I received an error. Further investigation lead me to find out that my mail server's file system was magically in read-only mode for some reason. Typically, at least for me, this means that there was some sort of disk errors (e.g, bad sectors) and as a preventative measure, Linux placed the system in read-only mode. I somewhat annoyed, considering I just purchased the mail server's hard drive (Seagate Barracuda 160GB 7200.7) about 4 months ago. Even though it had a 5-year warranty, that doesn't help too much when the mail server is in crisis mode.
After going home to investigate, I ended up having to run fsck manually to repair a number of disk errors. After doing so, the mail server operated flawlessly for about 2 weeks. Did I speak too soon? Of course, same thing happened again. Ready to gravity test my entire network, I decided to put on my optimistic glasses and consider this an opportunity. For quite some time, I've been considering migrating my servers from Gentoo to Ubuntu Linux, mostly due to amount of maintenance Gentoo requires. In addition, my mail server was a sluggish Pentium 2 450Mhz midget compared to the rest of my systems. Seeking through my memory for a minute, I remembered that I had previously converted my web server into a larger Antec case complete with Raid 1. Now what did I do with that other case? Funny I should ask that question. After peering into the back corner of my parts n'at office closet, I noticed a nice little mid-tower case. Digging my way through the computer shrapnel, I finally dragged it out kicking and screaming. Wait, this things feels kinda heavy I thought to myself.
This is the one part of the story where I actually have something positive to say about my luck. The system itself already had a power supply, cd-rom, floppy, and get this, a motherboard and 1.8Ghz Pentium 4 processor. Yeah, that's right, I totally forgot I left this thing in the closet after upgrading my web server several months ago. Perfect, I'll use these parts for the new mail server. With this notion in mind, I decided to call it a night and at this point, only had intentions of giving the mail server a new face-lift and fixing some of its internal plumbing…that's where it really began.
June 28, 2006 – A Hot Day in Hard Disk Land
Well, I began my little upgrade by venturing out to Best Buy in the Waterfront. I purchased a Western Digital 160GB SATA Hard Drive ($79.99) and a SATA controller card ($49.99). After arriving back at home, I prepped the new patient for surgery. After installing the new hardware, the next step was to toss Ubuntu Dapper on the system and configure it so that I could eventually swap the two systems with little downtime. Problem, how the heck to do I get all the data from the old system to the new system? That's an awfully good question, but I'll answer that one a little later. By the end of this night, I had a fully functional mail server complete with Postfix and Courier-IMAP as well as some other bells and whistles. Time for a nap.
June 29, 2006 – Time for a New Case
Since I had SSH access to the beta new mail server, I thought I'd check in and see how the little guy was doing while I was at work. To my surprise, the patient's hard drive seemed to have a fever of 57 degrees celsius. In case you might not already know, this is a very high temperature, and should have been between 30-40C. Maybe it's the case? Well, over lunch I went home and checked to see what the problem was, perhaps the case had poor cooling. Eh, cooling looked ok, but I decided to install another intake fan to facilitate better cooling. About five minutes later, the temperature was down to a toasty 47C. Not much better. Could it maybe be the case? I didn't have time to deal with this and headed back to work.
After work, I was determined to get a new case for this system, hopefully taking care of my temperature woes. I had done some previous research on some computer cases that were designed to maximize cooling and minimize noise. The winner of this competition was the Antec P-180. Pretty slick case in my opinion. Since I was somewhat in a time crunch, I decided to pay a little more and purchase it from CompUSA. After coming home, I performed a major transplant from the old case to the new Antec case. It was a success, well, to some degree. I left the beta mail server run overnight to see what kind of temperatures it would have in the morning. To my chagrin, I awoke the next morning to the same temperature troubles. As you could imagine, a large number of four-letter words erupted from my mouth.
June 30 thru July 4, 2006 – Relax and Wait
Over the next few days, I decided to further investigate some of the cooling issues I was experiencing. Above and beyond that, it was the holiday so I tried to enjoy myself for a few days. Yeah, that didn't happen. Despite the new case failing to take care of my temperature troubles, I was quite impressed with it. Although it was a jigsaw puzzle to figure out the first time, I couldn't help but imagine what the other systems would look like in that case. Needless to say, I found myself on Newegg.com ordering two more Antec P-180 cases ($119.99), complete with two Seasonic power supplies ($72.99). Thus, I waited…
July 5, 2006 – Compounding the Issue
Please consider crash and burn as the theme for this day. On the good side of things, I received my Newegg shipment with the two new Antec P-180 cases and power supplies. That makes three new cases at this point. So I already had a malfunctioning mail server, why not toss some more problems my way. After arriving home, I discovered my file server was acting a bit odd. While playing music streamed from that server, I was experiencing random disconnects and other strangeness. Time to diagnose the problem. Sigh. Disk issues on the file system partition. I had to run a manual fsck on the disk and boy did this guy have some serious issues. The end result was a non-functional operating system where all the files were now in the "lost+and+found" directory. Not only that, but the entire ext3 journal had a case of death, which didn't help things. Considering this server also does my internal DNS, my name resolution no longer worked internally. That's ok, I just pointed it at some other DNS servers and I could still get to the internet.
This is the point where the credit card smiles a bit and I decide to show computers that they won't get the best of me. I decided to upgrade the entire network with new hardware and migrate everything to Ubuntu. Off to Best Buy I go. I picked up two Western Digital 320GB drives ($189.99) and another SATA controller card ($49.99). Why two? RAID 1 of course. I was in no version of reality going to install another system without some kind of redundancy, in this case, my redundancy was meant to protect against a single disk failure. I considered this an acceptable risk. I proceeded to install the disks in the file server and began installing Ubuntu. Later that night, I had a semi-functinoal file server…but what about all that data. You see, the old file server had four disks in it so that data was in three places at any given time to avoid a single point of failure. Ahhh, I have a desktop system I could put the old disks in, maybe that would work. Wait, Windows can't read the ext2/3 filesystem, now what? Oh, I have a dual-boot desktop, that's right…Ubuntu will be able to read it.
Horray, so I have a game-plan to migrate the data. Ubuntu mounted the disks and I was ready to go, except for one thing…how am I going to get the data from point A to point B, Samba? Yeah, that's what I tried at first until I realized transferring 40GB of data over a 100Base-T connection takes way too long. Banging my head against the keyboard, I accidently peered over at my bookshelf in the office and noticed the tantalizing appearance of an external 320GB Seagate USB/Firewire hard drive. Now c'mon, what are the chances I could plug an external USB hard drive into Ubuntu and it would actually recognize it and do the right things? Apparently, pretty good since I was dancing around the office in amazement. So the tides are turning, well maybe.
At this juncture, I began copying all of my media, documents, and configuration files from the old file server disks to the external hard drive. Once that was complete, I proceeded to cross my fingers and plugged the same external disk into my new file server hoping it would recognize it…and it did!!! I spent the rest of the night moving data, organizing the file system, and configuring my internal DNS, etc. All in all, I had a new file server with a solid configuration and the hard disk temperatures were at a cool 35-37C.
July 6, 2006 – Web Server Facelift
Since the file and mail server looked so good in those new Antec P-180 cases, it was time for the web server to get some nip tuck attention. I took the web server down for about two hours while I transplanted it from an old Antec full-tower case to the new Antec P-180 case. After I was done, it came back online first try. I had enough for that night, I wanted to end it on a good note. No other parts were upgraded, not yet at least.
July 8, 2006 – One Bad Hard Disk for Another
I thought, don't ask why, that perhaps there was a faint chance that I just happened to get one of the bad Western Digital 160GB drives. Thus, I made my way down to Best Buy and picked up another. Upon going home, I once again re-installed Ubuntu on the mail server only to experience the same temperature problems. Sigh, now I had two defective Western Digital 160GB drives to wipe and return.
July 9, 2006 – Is Bigger Better?
I don't even know why I thought this might stand a chance for success. I went to Best Buy and exchanged one of the bad 160GB disks for a Western Digital 320GB disk. Why 320GB? Well, because I already had the file server running with two of them with no problems whatsoever. After a few hours, bada boom bada bing, the mail server was installed and guess what, it was a chilly 35C. Thank goodness, I finally…oh, wait…I spoke to soon. I wish I had never looked at the S.M.A.R.T diagnostics this fine night because within two hours, the disk was already reporting a "Reallocated_Sector_Ct" of 21. That's twenty-fricking-one bad sectors on the disk in just two hours. Had I anything in arms length to throw through the wall, I probably would have at this point. Needless to say, this drive was going back too, no questions asked.
July 10, 2006 – Mail Server Reloaded
As I mentioned before, I had two mail servers running in parallel. The one was just configured and ready to go despite it having 21 bad sectors, the other was production. Since the beta mail server was still having cooling problems and had a bad hard disk, I decided to go out and purchase, not one, but two more disks so that I could set the new mail server up in a RAID 1. Thus, I made another excursion through the bloody traffic and visited Best Buy. This time I picked up two Western Digital 250GB drives ($159.99). Why 250GB? Well, I wasn't going to try another Western Digital 160GB since I had that temperature problem. I came home, ripped apart the mail server and re-configured it in a RAID 1, migrating all the configuration, etc. Same temperature problem as before, both drives were about 45-50C.
Four-letter words echoed the streets of Shadyside for the next two hours. Not only had I now spent about 12 hours trying to solve these temperature issues, but now I was on my 3rd hard drive. That's quite a chunk of change my friends. And to make thing worse, anytime I return a hard drive, I make sure I eradicate all data on the disk. This requires DBAN and takes about 6-9 hours depending on the size of the disk and the type of method I use to wipe the disk. Thus, I popped in DBAN and let my disk get formatted the rest of the night. Before going to bed, I didn't some research into the temperature problem. It turns out that certain models of the Western Digital hard drives contain a firmware bug for lack of a better technical term. Nice to find this out AFTER I BOUGHT THREE DISKS. I was disheartened and went to bed.
Another Day, Another Disk – July 11, 2006
I decided to return the 2nd bad Western Digital 160GB disk since DBAN'ing it was complete. I scurried on down to Best Buy, same guy was there, and once again told him this drive was defective also. He didn't ask questions, so I picked up two Seagate 160GB SATA drives ($109.99). Upon returning home, and since I was now an expert in configuring and migrating data between systems, I had the new mail server up and running in a record time of 2 hours. Horray, my first big win.
Hardware Comes, Hardware Goes – July 13, 2006
A few days prior, I had placed an order with Newegg.com for one more Antec P-180 case and a Seasonic power supply. Well, it arrived right in time. Prior to using these parts to finish off the last upgrade, my router, I decided to make some returns. I was practically out of cash. At this point, I knew I had at least three hard drives to return. Would they believe that all three were actually defective? To me they were, but would Best Buy believe me? Would it be the same guy there that handled my returns before? I thought about employing a merchandise return strategy, but then decided against it. Customer is always right darn it. I walked in, informed the person at the counter that I had three defective drives. Surprisingly, that didn't seem to be a problem so I went back and picked up two Seagate 160GB SATA drives ($109.99). Heck, I even added the new Pearl Jam album to my purchase just for fun.
A few hours later, I had a fully functional RAID 1 Ubuntu Router. Boy, my luck is really starting to change.
The Grand Finale – July 16, 2006
Whew, I finally had a network again and man did it feel good to come home and actually relax without having to get the phillips screwdriver out and mess with computers. However, there were two things that were still bugging the heck out of me. First, my web server was the only server that didn't have SATA drives. And secondly, I previously had the monitor for my servers sitting on top of two of the servers. I certainly wasn't going to do that with my slick new cases. What a predicament.
My solution, buy more stuff. I once again returned to Best Buy and I'm surprised I'm not actually listed on their most wanted listed by now. I picked up a 19" Samsung LCD display ($249.99), a Sanus mounting bracket ($66.99), and two more Seagate 160GB SATA drives ($109.99). After returning home, I had some serious construction to do. I wanted to mount my new LCD display above my servers and pitch the previous 17" CRT that was back from my college days. With relative ease, I was able to get the LCD mounted and functional. I forgot to pick up the SATA controller card so I still have to install the disks and upgrade the web server to Ubuntu.
- Western Digital 320GB WD3200KS (installed)
- Western Digital 320GB WD3200KS (installed)
- Western Digital 320GB WD3200KS (returned, bad sectors)
- Western Digital 160GB WD1600JSRTL (returned, bad temperature sensor)
- Western Digital 160GB WD1600JSRTL (returned, bad temperature sensor)
- Western Digital 250GB WD2500KSRTL (returned, bad temperature sensor)
- Western Digital 250GB WD2500KSRTL (returned, bad temperature sensor)
- Seagate 160GB ST3160812AS (installed)
- Seagate 160GB ST3160812AS (installed)
- Seagate 160GB ST3160812AS (installed)
- Seagate 160GB ST3160812AS (installed)
- Seagate 160GB ST3160812AS (installed)
- Seagate 160GB ST3160812AS (installed)
- Always monitor your disk's S.M.A.R.T statistics using something like smartmontools
- The Gentoo and Knoppix boot CDs are your friends
- Never believe that replacing one bad model of a disk with another of the same model will fix the problem
- RAID 1 everything, if it's appropriate for your configuration
- Having an external 320GB hard drive that works under Linux cures plenty of headaches
- Backup, backup, backup
Hope you all made it to the end of this story, I know plenty of people were asking me to tell them what I've been up to over the past few weeks since I've been in such a bad mood all the time. I hope this helps to explain.
Some additional footage
Migrating data – This premiere footage comes from one of the 4 data migrations from server to server. As you can see, I had to use the combination of a Mac, an external hard drive, a desktop, and a server to get anything accomplished.
DD My Friend – In the early days of this mess, I was trying to dd things from one disk to another. That didn't work out so well. If you pay close attention you can notice my anger level by looking at the "mkdir" statements.
Crap I Tell You – I believe this image shows one of the many bad Western Digital hard drives. Whatever you do, don't buy these things unless you don't care about monitoring its S.M.A.R.T statistics.
Misc Parts – Not sure what all this is, but I can tell you one thing. You see those fans? Those are Zalman fans and they are huuuuge. I had to return them to Newegg.com.
For Sale – Anyone need some 120 or 160GB Seagate hard disks with a 5-year warranty? If so, let me know because these disks are fully operational.
Mounted Monitor – First of its kind in my apartment. I've never mounted a monitor for computing purposes before, but this seemed like a reasonable thing to do considering my setup.
Yard Sale – As you can see, I have two full-tower Antec case and three other mid-tower cases for sale. Need one, lemme know.
Back to Normal – After a hard two weeks of intermixing parts between systems to get things accomplished, I finally managed to clean up things late last night. Everything is back in order.
Slideshow Images


By 
































I still send friends to this Story/Article to show what a true geek is all about..
Holla~!!
-lo