memt_210.jpg

Memtest86 - A Stand-alone Memory Diagnostic

Memtest86 is thorough, stand alone memory test for x86 architecture computers. BIOS based memory tests are only a quick check and often miss many of the failures that are detected by Memtest86.

Memtest86 is released under version 2 of the Gnu Public License (GPL). There are no restrictions for use, private or commercial, and it may be freely distributed.



Memtest86 2.8 Release (18/Oct/2001)

Enhancements in v2.8

Version 2.8 is the preferred release. The 2.7 release is provided here as an alternative.

Since Memtest86 is a standalone program it does not require any operating system support for execution. It can be used with any PC regardless of what operating system, if any, is installed. The test image may be loaded from a floppy disk or may be loaded via LILO on Linux systems. Any Unix, Windows or DOS system may be used to create a boot floppy.


Online Commands

Memtest86 has a limited number of online commands. Online commands provide control over cache settings, error report modes, test selection and test address range. A help bar is displayed at the bottom of the screen listing the available on-line commands.

  Command  Description

  ESC   Exits the test and does a warm restart via the BIOS.

  c     Enters test configuration menu
	    Menu options are:
               1) Cache mode
               2) Test selection
               3) Address Range
               4) Error Summary
               5) Error Report Mode
               6) Restart Test
               9) Reprint Screen

  SP    Set scroll lock (Stops scrolling of error messages)
	Note: Testing is stalled when the scroll lock is
	set and the scroll region is full.

  CR    Clear scroll lock (Enables error message scrolling)
	

Error Display

Memtest has two options for reporting errors. The default is to report individual errors. Memtest is also able to create patterns used by the Linux BadRAM feature. This slick feature allows Linux to avoid bad memory pages. Details about the BadRAM feature can be found at: http://home.zonnet.nl/vanrein/badram

For individual errors the following information is displayed when a memory error is detected. An error message is only displayed for errors with a different address or failing bit pattern. All displayed values are in hexadecimal.

Tst: Test Number

Failing Address: Failing memory address

Good: Expected data pattern

Bad: Failing data pattern

Err-Bits: Exclusive or of good and bad data (this shows the position of the failing bit(s))

Count: Number of consecutive errors with the same address and failing bits


Troubleshooting Memory Errors

Please be aware that not all errors reported by Memtest86 are due to bad memory. The test implicitly tests the CPU, L1 and L2 caches as well as the motherboard. It is impossible for the test to determine what causes the failure to occur. However, most failures will be due to a problem with memory. When it is not, the only option is to replace parts until the failure is corrected.

Once a memory error has been detected, determining the failing SIMM/DIMM module is not a clear cut procedure. With the large number of motherboard vendors and possible combinations of SIMM slots it would be difficult if not impossible to assemble complete information about how a particular error would map to a failing memory module. However, there are steps that may be taken to determine the failing module. Here are four techniques that you may wish to use:

1) Removing modules
This is simplest method for isolating a failing modules, but may only be employed when one or more modules can be removed from the system. By selectively removing modules from the system and then running the test you will be able to find the bad modules. Be sure to note exactly which modules are in the system when the test passes and when the test fails.

2) Rotating modules
When none of the modules can be removed then you may wish to rotate modules to find the failing one. This technique can only be used if there are three or more modules in the system. Change the location of two modules at a time. For example put the module from slot 1 into slot 2 and put the module from slot 2 in slot 1. Run the test and if either the failing bit or address changes then you know that the failing module is one of the ones just moved. By using several combinations of module movement you should be able to determine which module is failing.

3) Replacing modules
If you are unable to use either of the previous techniques then you are left to selective replacement of modules to find the failure.

4) Avoiding allocation
The printing mode for BadRAM patterns is intended to construct boot time parameters for a Linux kernel that is compiled with BadRAM support. This work-around makes it possible for Linux to reliably run on your average damaged RAM (or clearly panic if it cannot). For more information on BadRAM support for Linux, sail to http://home.zonnet.nl/vanrein/badram

Sometimes memory errors show up due to component incompatibility. A memory DIMM/SIMM may work fine in one system and not in another. This is not uncommon and is a source of confusion. The components are not necessarily bad but certain combinations may need to be avoided.

I am often asked about the reliability of errors reported by Mestest86. In the vast majority of cases errors reported by the test are valid. There are some systems that cause Memtest86 to be confused about the size of memory and it will try to test non-existent memory. This will cause a large number of consecutive addresses to be reported as bad and generally there will be many bits in error. If you have a relatively small number of failing addresses and only one or two bits in error you can be certain that the errors are valid. Also intermittent errors are always valid.

All valid memory errors should be corrected. It is possible that a particular error will never show up in normal operation. However, operating with marginal memory is risky and can result in data loss and even disk corruption. You can be sure that Murphy will get you if you know about a memory error and ignore it.

Memtest86 can not diagnose many types of PC failures. For example a faulty CPU that causes Windows to crash will most likely just cause Memtest86 to crash in the same way.


Execution Time

The time required for a complete pass of Memtest86 will vary greatly depending on CPU speed, memory speed and memory size. Here are the execution times from a PentiumII-366 with 64mb of RAM:
Test 00:05
Test 10:18
Test 21:02
Test 31:38
Test 48:05
Test 51:40
Test 64:24
Test 76:04
Total (default tests)23:16
Test 812:30
Test 949:30
Test 1030:34
Test 113:29:40
Total (all tests)5:25:30

Memtest86 continues executes indefinitely. The pass counter increments each time that all of the selected tests have been run. Generally a single pass is sufficient to catch all but the most obscure errors. However, for complete confidence when intermittent errors are suspected testing for a longer period is advised.


Memory Testing Philosophy

There are many good approaches for testing memory. However, many tests simply throw some patterns at memory without much thought or knowledge of the memory architecture or how errors can best be detected. This works fine for hard memory failures but does little to find intermittent errors. The BIOS based memory tests are useless for finding intermittent memory errors.

Memory chips consist of a large array of tightly packed memory cells, one for each bit of data. The vast majority of the intermittent failures are a result of interaction between these memory cells. Often writing a memory cell can cause one of the adjacent cells to be written with the same data. An effective memory test should attempt to test for this condition. Therefore, an ideal strategy for testing memory would be the following:

1) write a cell with a zero
2) write all of the adjacent cells with a one, one or more times
3) check that the first cell still has a zero

It should be obvious that this strategy requires an exact knowledge of how the memory cells are laid out on the chip. In addition there is a never ending number of possible chip layouts for different chip types and manufacturers making this strategy impractical. However, there are testing algorithms that can approximate this ideal.


Memtest86 Test Algorithms

Memtest86 uses two algorithms that provide a reasonable approximation of the ideal test strategy above. The first of these strategies is called moving inversions. The moving inversion test works as follows:

1) Fill memory with a pattern
2) Starting at the lowest address
2a check that the pattern has not changed
2b write the patterns complement
2c increment the address
repeat 2a - 2c
3) Starting at the highest address
3a check that the pattern has not changed
3b write the patterns complement
3c decrement the address
repeat 3a - 3c

This algorithm is a good approximation of an ideal memory test but there are some limitations. Most high density chips today store data 4 to 16 bits wide. With chips that are more than one bit wide it is impossible to selectively read or write just one bit. This means that we cannot guarantee that all adjacent cells have been tested for interaction. In this case the best we can do is to use some patterns to insure that all adjacent cells have at least been written with all possible one and zero combinations.

It can also be seen that caching, buffering and out of order execution will interfere with the moving inversions algorithm and make less effective. It is possible to turn off cache but the memory buffering in new high performance chips can not be disabled. To address this limitation a new algorithm I call Modulo-X was created. This algorithm is not affected by cache or buffering. The algorithm works as follows:

1) For starting offsets of 0 - 20 do
1a write every 20th location with a pattern
1b write all other locations with the patterns complement
repeat 1b one or more times
1c check every 20th location for the pattern

This algorithm accomplishes nearly the same level of adjacency testing as moving inversions but is not affected by caching or buffering. Since separate write passes (1a, 1b) and the read pass (1c) are done for all of memory we can be assured that all of the buffers and cache have been flushed between passes. The selection of 20 as the stride size was somewhat arbitrary. Larger strides may be more effective but would take longer to execute. The choice of 20 seemed to be a reasonable compromise between speed and thoroughness.


Individual Test Descriptions

Memtest86 executes a series of numbered test sections to check for errors. These test sections consist of a combination of test algorithm, data pattern and cache setting. The execution order for these tests were arranged so that errors will be detected as rapidly as possible. Tests 8, 9, 10 and 11 are very long running extended tests and are only executed when extended testing is selected. The extended tests have a low probability of finding errors that were missed by the default tests. A description of each of the test sections follows:

Test 0 [Address test, walking ones, no cache]

Tests all address bits in all memory banks by using a walking ones address pattern.

Test 1 [Moving Inv, ones&zeros, cached]

This test uses the moving inversions algorithm with patterns of only ones and zeros. Cache is enabled even though it interferes to some degree with the test algorithm. With cache enabled this test does not take long and should quickly find all "hard" errors and some more subtle errors. This test is only a quick check.

Test 2 [Address test, own address, no cache]

Each address is written with its own address and then is checked for consistency. In theory previous tests should have caught any memory addressing problems. This test should catch any addressing errors that somehow were not previously detected.

Test 3 [Moving inv, 8 bit pat, cached]

This is the same as test one but uses a 8 bit wide pattern of "walking" ones and zeros. This test will better detect subtle errors in "wide" memory chips. A total of 20 data patterns are used.

Test 4 [Moving inv, 32 bit pat, cached]

This is a variation of the moving inversions algorithm that shifts the data pattern left one bit for each successive address. The starting bit position is shifted left for each pass. To use all possible data patterns 32 passes are required. This test is effective in detecting data sensitive errors in "wide" memory chips.

Test 5 [Block move, 64 moves, cached]

This test stresses memory by using block move (movsl) instructions and is based on Robert Redelmeier's burnBX test. Memory is initialized with shifting patterns that are inverted every 8 bytes. Then 4mb blocks of memory are moved around using the movsl instruction. After the moves are completed the data patterns are checked. Because the data is checked only after the memory moves are completed it is not possible to know where the error occurred. The addresses reported are only for where the bad pattern was found. Since the moves are constrained to a 8mb segment of memory the failing address will always be less than 8mb away from the reported address. Errors from this test are not used to calculate BadRAM patterns.

Test 6 [Modulo 20, ones&zeros, cached]

Using the Modulo-X algorithm should uncover errors that are not detected by moving inversions due to cache and buffering interference with the the algorithm. As with test one only ones and zeros are used for data patterns.

Test 7 [Moving inv, ones&zeros, no cache]

This is the same as test one but without cache. With cache off there will be much less interference with the test algorithm. However, the execution time is much, much longer. This test may find very subtle errors missed by previous tests.

Test 8 [Block move, 512 moves, cached]

This is the first extended test. This is the same as test #5 except that we do more memory moves before checking memory. Errors from this test are not used to calculate BadRAM patterns.

Test 9 [Moving inv, 8 bit pat, no cache]

By using an 8 bit pattern with cache off this test should be effective in detecting all types of errors. However, it takes a very long time to execute and there is a low probability that it will detect errors not found by the previous tests.

Test 10 [Modulo 20, 8 bit, cached]

This is the first test to use the Modulo-X algorithm with a data pattern other than ones and zeros. This combination of algorithm and data pattern should be quite effective. However, it's very long execution time relegates it to the extended test section.

Test 11 [Moving inv, 32 bit pat, no cache]

This test should be the most effective in finding errors that are data pattern sensitive. However, without cache it's execution time is excessively long.

Theory of Operation

Bootstrap and setup code is used to load Memtest86. This code loads the test, sets up memory management registers and does miscellaneous setup. When the load and setup are complete the memory map is as follows:

0x000	|-----------------------------------------------|
	|	Stack (4k)				|
0x1000	|-----------------------------------------------|
	|	Memtest-text (24k) Origin  0x1000	|
0x5000	-------------------------------------------------
	|	Memtest-data (2k)  Origin  0x7000	|
0x5400	-------------------------------------------------
	|	Memtest-text (24k) Origin  0x108800	|
0x9400	-------------------------------------------------
	|	Memtest-data (2k)  Origin  0x10e800	|
0x9800	-------------------------------------------------
	|	Common variables (1k)			|
0x9c00	-------------------------------------------------

Relocation of the test is accomplished by using two copies of the test code that have been built to execute at different addresses (different origins). When the test is started, the code with an origin of 0x1000 is executed. At the end of the testing phase the memory block from 0x1000 to 0xe400 is copied to 0x101000, the stack is set to 0x101000 and then we jump to address 0x108800 (the code with an origin of 0x108800). When the code is relocated only the first 640k of memory is tested. When this test is complete then the code is moved back to 0x1000, the stack is set back to 0x1000 and then we jump to 0x1000 (the code with an origin of 0x1000).

When Memtest86 is loaded into memory it first scans memory to find all segments of available read/write memory (DRAM). DRAM is identified by reading a location and then writing its complement. If at least one bit in each byte changes then we assume that it is DRAM. To save time we only do this check every 1k bytes. All memory from 0xa0000 to 0xfffff is skipped. Each segment of memory is displayed on the right side of the screen. All segments of memory that are found will be tested regardless of size. The memory scan is limited to the maximum memory size supported by the motherboard.


Problem Reporting - Contact Information

Due to the growing popularity of Memtest86 I am being inundated by, questions, feedback, problem reports and requests for enhancements. Memtest86 is a side project and often my day job interferes with Memtest86 support. To help me keep up with this project, please use the following guidelines.

Problems/Bugs

Before submitting a problem report please check the Known Problems section to see if this problem has already been reported. Be sure to include the version number and also any details that may be relevant.

With some PC's Memtest86 will just die with no hints as to what went wrong. Without any details it is nearly impossible to fix these failures. Fixing these problems will require debugging assistance on your part. There is no point in reporting these failures unless you have a Linux system and would be willing to assist me in finding the failure.

Enhancements

If you would like to request an enhancement please see if is already on the Planned Features List before sending your request. All requests will be considered, but not all will be implemented. If you are be interested in contributing code please contact me so that the integration can be co-ordinated.

Feedback

I have received a lot of feedback about the effectiveness of various tests. I am still interested in hearing about failures that only a single test was able to detect. Of course, gratitude, praise and donations are always accepted.

Questions

Ask and ye shall receive, but it may take a while.

Chris Brady, Email: crsbrady@earthlink.net


Donations

With considerable reluctance I am resorting to a low key solicitation for donations. It never has been my intent to profit from this program and I am pleased that Memtest86 has been helpful. However, the time required to support this program has grow significantly this year. I also have the modest cost of hosting this web-site that I would like to recover. So if you find Memtest86 useful and you feel inclined to make a small PayPal donation please do so. Use my e-mail address "crsbrady@earthlink.net" for the recipient.


Known Problems

There is a problem with memory sizing on some old 486 motherboards. A compile time option (BIOS_MEMSZ) for obtaining the last memory address from the BIOS was added in version 2.1. This will fix this problem for some but not all motherboards. In version 2.2 an online option for setting the upper memory address was added. Press the "c" key immediately after the test starts and use the menu options to set the upper memory limit.

Sometimes when booting from a floppy disk the following messages scroll up on the screen:

        X:8000
        AX:0212
        BX:8600
        CX:0201
        DX:0000

This the BIOS reporting floppy disk read errors. Either re-write or toss the floppy disk.

Memtest86 does not support more than 2gb of memory. There are a number of difficult problems with crossing the 2gb boundary that will need to be fixed to support 2gb+ memory sizes.

Memtest86 has not been designed for or tested with error correcting (ECC) memory. With ECC enabled the test will not be able to detect single bit errors but the should otherwise execute correctly.

Memtest86 can not diagnose many types of PC failures. For example a faulty CPU that causes Windows to crash will most likely just cause Memtest86 to crash in the same way.

Memtest86 has no support for multiple processors. Memtest86 should run without problems, but it will only use one CPU.

Changes in the compiler and loader have caused problems with Memtest86 resulting in both build failures and errors in execution. A binary image (precomp.bin) of the test is included and may be used if problems are encountered.


Planned Features List

This is a list of enhancements planned for future releases of Memtest86. There is no timetable for when these will be implemented, if ever.


Change Log

Enhancements in v2.8 (18/Oct/2001)

Enhancements in v2.7 (12/Jul/2001)

Enhancements in v2.6 (25/May/2001)

Enhancements in v2.5 (13/Dec/00)

Enhancements in v2.4

Enhancements in v2.3

Enhancements in v2.2

Enhancements in v2.1

Enhancements in v2.0

Enhancements in v1.5

Enhancements in v1.4

Enhancements in v1.3


Acknowledgments

The initial versions of the source files bootsect.S, setup.S, head.S and build.c are from the Linux 1.2.1 kernel and have been heavily modified.

Doug Sisk provided code to support a console connected via a serial port.

Code to create BadRAM patterns was provided by Rick van Rein.

Screen buffer code was provided by Jani Averbach.

Eric Biederman reworked the build process making it far simpler and also to produce a network bootable ELF image.