[olug] Machine Locking up, need hardware guru advice
Miller, Scott L (Omaha Networks)
scott.l.miller at hp.com
Mon Aug 9 16:16:34 UTC 2004
Hi all,
I've had random hard lockup problems with a self built PC for a long while now. It's getting to be a real PITA, as it managed to wipe my root partition out this past weekend.
Requisite info:
ASUS A7N8X Deluxe mobo
AMD Athlon XP 2800+ (barton 2.083 GHz)
1 Gig Ram
MSI nVidia GeForce FX5200 Video Card
Primary IDE Channel : 60 Gig HD (WD I think) & 52x CD-Rom
Secondary IDE Channel: 20 Gig HD (also WD I think) & 24?x16?x52x CD-RW
OS - I don't think it matters, Mandrake 9.1 until the crash this weekend claimed the root partition, then Knoppix CD received at latest install fest after that, it's also locked up when running the new Novell supplied SUSE distribution that I installed at the install fest on a 160 Gig Harddrive. I could also test with Win2K some more, but haven't yet...
Symptoms:
Locks up hard - no keyboard/mouse response at all, reset or power button to reboot.
Troubleshooting steps taken:
First off, heat is not a problem, system is water cooled, processor temp monitored by 2 sensors, one built into mobo, the other probe, which is mounted next to the processor, is connected to a Digital Doc 5. From all the readings I've taken, no part of the system has ever gotten above 100 degrees Fahrenheit. A DigDoc5 monitors 8 locations, I'm monitoring incoming air, video card processor, memory, northbridge heatsink, processor, drive area, power supply and something else I can't remember off hand. (typing at work, machine's at home)
Ok, so I first thought RAM was the problem, but I've swapped that a few times, and run memtest a bunch. No lock ups during that process, and the memory tests are clean. I also thought maybe it was driver issue with Mandrake until the CD version of Knoppix exhibited the same behavior. Also used to think it might have been the USB mouse/keyboard, but swapping those for PS/2 mouse and keyboard didn't make any difference.
So, once the crash ate the root partition, I booted up with the Knoppix CD to attempt a fix, it was toast. Then, I tried to reinstall Mandrake, got mostly finished but then it Locked Up. I rebooted, seemed to be fine, started configuring, had another lock up, this time it ate Perl, and thus wouldn't let me into X-Windows. So, I abandoned that and began to only troubleshoot.
I again grabbed the Knoppix CD, booted it, and ran the memtest program for about an hour. No lockups, no errors. No subsequent memtest runs ever resulted in a lock up.
Now, for those who are not familiar with the Knoppix CD, it is an entire linux installation on CD, when it boots it creates a RAM drive to store the various things like /etc /home etc. So, there is no hard drive involved when it first comes up. This is important because as long as I left the hard drives alone, the machine was stable and running well for hours at a time. I did that to search the net for other descriptions of problems similar to mine, and I ended up upgrading my BIOS during that search. The BIOS update didn't help at all, but also didn't hurt anything either (that I can tell).
Once I got the hard drives involved, that's when the machine locks up. I started testing the first hard drive thinking there might be some bad blocks. Now to be totally fair, I was able to get a random read/write non-destructive test of the root partition to complete 2 or 3 times. However, it was the 7 to 10 times that the random lockups happened during this process that has led me to believe that the mobo chipset or linux drivers for said chipset is the real culprit. I ruled out the actual hard drive by also testing on a blank partition I had on the secondary 20 Gig HD, and it locked up during that test as well. BTW, no bad blocks were ever found on the Hard Drives.
I'd also thought about conflicts with the CD drives, so removed the CD-ROM drive that was sharing the primary channel with the 60 gig HD. Didn't matter. It still locked up.
So, does anyone have any suggestions of what more to test? Or maybe what program to use under windows to really stress test the Hard Drive/controller?
Thanks,
-Scott
More information about the OLUG
mailing list