[OLUG] Isolating flaky hardware problems
tetherow at nol.org
tetherow at nol.org
Thu Feb 10 17:37:40 UTC 2000
On 9 Feb, Dave Burchell wrote:
> Vincent says:
>
>> Dave Burchell wrote:
>> >
>> > I've got some hardware that may be flaky, and I need some advice on
>> > narrowing down the problem.
>> >
>> > Long story short, how do I isolate possible CPU or RAM intermittent
>> > failures?
>
>> Well, that wasn't a very good attempt at making a long story short :)
>
> Doh! I meant to say that the above sentence was long-story-short, and
> the long story was everything else. (Now my _posts_ are flaky...)
>
>> Given the choice, you're probably right to assume it's RAM and not the
>> CPU. If it were the cpu, I doubt it would have been able to give you an
>> error message at all.
>> Considering it's a 200Mhz system, I doubt all 128Mb RAM is original and
>> matching. I would take a look at it and see if the SIMM pairs in each
>> bank are identical. I mean identical too, not just speed and capacity.
>> That can cause some weird problems. Also make sure that each bank are
>> the same type. you can also get some issues if one bank is ecc or
>> parity and another isn't. Once you've eliminated those possibilities, I
>> would try locating the bad bank, and then the bad SIMM by deduction.
>
> Thanks for the ideas, V. I'm going to check the SIMMs for uniformity.
> I'll also check for mismatched gold/silver contacts now that I think of
> it. If I try to locate the bad bank and SIMM by deduction, what can I
> use to really hammer on the memory? Should I just write a Perl script
> that generates a huge dataset to suck up all the memory? Should I
> disable the swap?
Check out the following two from Freshmeat:
memtester is a user-space utility for testing the memory subsystem in a computer
to determine if it is faulty. It does a reasonably good job of finding
intermittent faults and non-deterministic faults. It has many tests to
help catch borderline memory, and generates a verbose report of faults found,
tests run, and time taken.
Download: http://www.qcc.sk.ca/~charlesc/software/memtester/#download (3147 hits)
Homepage: http://www.qcc.sk.ca/~charlesc/software/memtester/
Memtest-86 is very thorough, stand alone memory test for x86 and Pentium systems
(and compatibles).
Download: http://reality.sgi.com/cbrady_denver/memtest86/memtest86-2.1.tar.gz (2268 hits)
Homepage: http://reality.sgi.com/cbrady_denver/memtest86/ (3619 hits)
> I'd guess the machine is about 3 years old. My user has been using NT
> pretty much all this time without many complaints (that I know of; I'll
> press him for more background). If a machine _has_ been working mostly
> ok with NT then does that mean it most likely was actually ok and
> developed a recent problem? Or could it be that NT just didn't fully
> use the system (or stress the system in the same way Linux does) and
> thus didn't uncover the problem, which was there from the start? Is
> this problem really new at all?
How would you know, stuff dies all the time in the MS world ;)
------------------------------------------------------------------------
Sam Tetherow tetherow at nol.org
Director of Development
Nebrask@ Online http://www.nol.org/
-------------------------------------------------------------------------
Sent by OLUG Mailing list Manager, run by ezmlm. http://olug.bstc.net/
To unsubscribe: `echo unsubsribe | mail olug-unsubscribe at bstc.net`
More information about the OLUG
mailing list