A Highly Effective Memory Check Method

来源:互联网 发布:韩国代购淘宝店 编辑:程序博客网 时间:2024/05/21 11:36

A Highly Effective Memory Check Method   

sailor_forever  sailing_9806#163.com

http://blog.csdn.net/sailor_8318/archive/2009/10/12/4660555.aspx

 

 

1    Introduction    3
1.1    Purpose    3
1.2    Revision Information    3
1.3    Reference    3
1.4    Abbreviations    3
2    General    3
3    Common memory problems    4
3.1    Electrical wiring problems    4
3.2    Integrity problems    5
4    Test strategy    6
5    Data bus test    7
6    Address bus test    9
7    Integrity test    13
8    Flash Test    16
9    Data Width    16
10    Where to execute test code    16



1    Introduction
1.1    Purpose
This article shows how to check the most common memory problems with a set of three efficient and portable memory test functions. It is applicable for both RAM and Flash device.
1.2    Revision Information
1.3    Reference
http://article.ednchina.com/2005-12/20051230085057.htm
http://www.6502.org/source/general/address_test.html
http://www.freepatentsonline.com/4891811.html
http://www.freepatentsonline.com/7526689.html
http://www.embeddedrelated.com/usenet/embedded/show/99558-1.php
http://www.netrino.com/Embedded-Systems/How-To/Memory-Test-Suite-C
1.4    Abbreviations

2    General
For embedded system, it is desirable to test any onboard RAM at least as often as the system is hard reset. It is up to the embedded software developer, then, to figure out what can go wrong and design a suite of tests that will uncover potential problems.
At first glance, writing a memory test may seem like a fairly simple thing. However, as you look at the problem more closely you will realize that it can be difficult to detect subtle memory problems with a simple test.
The purpose of a memory test is to confirm that each storage location in a memory device is working. In other words, if you store the value 50 at a particular address, you expect to find that value stored there until another value is written to that same address. The basic idea behind any memory test, then, is to write some set of data to each address in the memory device and verify the data by reading it back. If all the values read back are the same as those that were written, then the memory device is said to pass the test. As you will see, it is only through careful selection of the set of data values and test sequence that you can be sure that a passing result is meaningful.

3    Common memory problems
Before implementing any of the possible test algorithms, you should be familiar with the types of memory problems that are likely to occur. One common misconception among software engineers is that most memory problems occur within the chips themselves. But nowadays, the manufacturers of memory devices perform a variety of post-production tests on each batch of chips. If there is a problem with a particular batch, it is extremely unlikely that one of the bad chips will make its way into your system.
The most common source of actual memory problems is the circuit board. Typical circuit board problems are problems with the wiring between the processor and memory. These are the problems that a good memory test algorithm should be able to detect.

 

3.1    Electrical wiring problems
An electrical wiring problem could be caused by an error in design or production of the board or as the result of damage received after manufacture. Each of the wires that connect the memory device to the processor is one of three types: an address line, a data line, or a control line. The address and data lines are used to select the memory location and to transfer the data, respectively. The control lines tell the memory device whether the processor wants to read or write the location and precisely when the data will be transferred. Unfortunately, one or more of these wires could be improperly routed or damaged in such a way that it is either shorted (for example, connected to another wire on the board) or open (not connected to anything). These problems are often caused by a bit of solder splash or a broken trace, respectively. Both cases are illustrated as below.
 
Figure 1  Possible wiring problems
Problems with the electrical connections to the processor will cause the memory device to behave incorrectly. Data may be stored incorrectly, stored at the wrong address, or not stored at all. Each of these symptoms can be explained by wiring problems on the data, address, and control lines, respectively.
If the problem is with a data line, several data bits may appear to be "stuck together" (for example, two or more bits always contain the same value, regardless of the data transmitted). Similarly, a data bit may be either "stuck high" (always 1) or "stuck low" (always 0). These problems can be detected by writing a sequence of data values designed to test that each data pin can be set to 0 and 1, independently of all the others.
If an address line has a wiring problem, the contents of two memory locations or two blocks may appear to overlap. In other words, data written to one address will actually affect or even overwrite the contents of another address instead. This happens because an address bit that is shorted or open will cause the memory device to see an address different than the one selected by the processor.
Another possibility is that one of the control lines is shorted or open. The operation of many control signals is specific to either the processor or memory architecture. Fortunately, if there is a problem with a control line, the memory will probably not work at all, and this will not be hidden as data and address lines sometimes.

 

3.2    Integrity problems
Sometimes chip is damaged due to physical or electrical factor, but this is rare. For this situation, test is only meaningful when data bus and address bus turn out to be working well. And integrity problems can be identified by testing every single bit with 0 and 1. Of course, this is very time consuming, so just test partial locations is OK.

4    Test strategy
Before going on, let's quickly review the types of memory problems we must be able to detect. Memory chips only rarely have internal errors, but, if they do, they will be detected by any test.
By carefully selecting test data and the order in which the addresses are tested, it is possible to detect all of the memory problems described above. It is usually best to break memory test into small, single-minded pieces. This helps to improve the efficiency of the overall test and the readability of the code. More specific tests can also provide more detailed information about the source of the problem.
It is best to have three individual memory tests: a data bus test, an address bus test, and an integrity test. The first two tests detect electrical wiring problems while the third is intended to detect missing chips and integrity failures. As an unintended consequence, the integrity test will also uncover problems with the control bus wiring, though it will not provide useful information about the source of such a problem.
The order in which you execute these three tests is important. The proper order is: data bus test first, followed by the address bus test, and then the integrity test. That's because:
For data bus test, as long as data are written at one fixed address, even there is something wrong with address lines, written data and read back data are at the same address. So data bus test result is convincing.
The address bus test assumes a working data bus, because if test fails, you could not identify whether data bus or address bus is the cause if data buses are not verified.
The integrity test results are meaningless unless both the address and data buses are known to be good.
Besides, for data and address bus test, between write and read back operation, any other write operation is not allowed to ensure the validation of test result
By looking at the data value or address at which the test failed, designer should be able to quickly isolate the problem on the circuit board.

5    Data bus test

The first thing we want to test is the data bus wiring. We need to confirm that any value placed on the data bus by the processor is correctly received by the memory device at the other end. The most obvious way to test that is to write all possible data values and verify that the memory device stores each one successfully. However, that is not the most efficient test available. A faster method is to test the bus one bit at a time. The data bus passes the test if each data bit can be set to 0 and 1, independently of the other data bits.
00000001
00000010
00000100
00001000
00010000
00100000
01000000
10000000
Table 1 Consecutive data values for the walking 1's test
A good way to test each bit independently is to perform the so-called "walking 1's test." Table 1 shows the data patterns used in an 8-bit version of this test and all bits experience 0 and 1 change. The number of data values to test is the same as the width of the data bus. This reduces the number of test patterns from 2n to n, where n is the width of the data bus.
Since we are testing only the data bus at this point, all of the data values can be written to the same address. Any address within the memory device will do.
To perform the walking 1's test, simply write the first data value in the table, verify it by reading it back, write the second value, verify, and so on. When you reach the end of the table, the test is complete.
But this method can not identify whether data lines are stuck high or stuck low or shorted. For shorted lines, when 1 and 0 connect, whether result is 0 or 1 is not fixed. If result is 0, for example, written value is 00000001 and read back value is 0, whether first bit is stuck low or first bit shorted with any other line is not sure.
If all bits are same, then shorted impact can be eliminated. For all 0 test, any stuck high bit will be identified; and for all 1 test, any stuck low bit will be identified. But any shorted situation will not be discovered.
Then without any stuck high or stuck low bit, walking 1's test will identify any shorted lines.

/**********************************************************************
 *
 * Function:    TestDataBus()
 *
 * Description: Test the data bus wiring in a memory region by
 *              performing a walking 1's test at a fixed address
 *              within that region.  stuck high, stuck low and any shorted
 *              lines can be identified.
 *
 * Inputs:      
 * Outputs:     test pattern, actual value and error lines
 *
 * Returns:     0 if the test succeeds. 
 *                  1 if the test fails.
 *
 **********************************************************************/
int TestDataBus (unsigned int *errline, u32 *expected, u32 *actual)
{
    vu32    *addr;
    u32    val;
    u32    readback;
   
    unsigned int lineindex;
   
    addr = CFG_MEMTEST_START;
   
    /* stuck high test, all 0 */
    *addr = 0;
    readback = *addr;
    if(readback != 0) {
    *expected = 0;
    *actual = readback;
    printf ("Data bus stuck high test fail: expected 0x%08lx, actual 0x%08lx/n", *expected, *actual);
    return 1;
    }

    /* stuck low test, all 1 */
    *addr = ~(int)0;
    readback = *addr;
    if(readback != ~(int)0) {
    *expected = ~(int)0;
    *actual = readback;
    printf ("Data bus stuck low test fail: expected 0x%08lx, actual 0x%08lx/n", *expected, *actual);
    return 1;
    }

    /* shorten test */
    for(lineindex = 0, val = 1; val != 0; val <<= 1, lineindex++) {
        /* walking 1 */
        *addr  = val;
        readback = *addr;

        if(readback != val) {
        *expected = val;
        *actual = readback;
        *errline = lineindex;
        printf ("Shorted at data line %d: expected 0x%08lx, actual 0x%08lx/n", lineindex, val, readback);
        return 1;
        }

        /* walking 0 */       
        *addr  = ~val;
        readback = *addr;
        if(readback != ~val) {
        *expected = val;
        *actual = readback;
        *errline = lineindex;
        printf ("Shorted at data line %d: expected 0x%08lx, actual 0x%08lx/n", lineindex, val, readback);
        return 1;
        }
    }
}

List 1 Data bus test

6    Address bus test
After confirming that the data bus works properly, you should next test the address bus. Remember that address bus problems lead to overlapping memory locations. Many possible addresses could overlap. However, it is not necessary to check every possible combination. You should instead follow the example of the data bus test above and try to isolate each address bit during testing. You just need to confirm that each of the address pins can be set to 0 and 1 without affecting any of the others.
The smallest set of addresses that will cover all possible combinations is the set of "power-of-two" addresses. These addresses are analogous to the set of data values used in the walking 1's test. The corresponding memory locations are 0001h, 0002h, 0004h, 0008h, 0010h, 0020h, and so on. In addition, address 0000h must also be tested. The possibility of overlapping locations makes the address bus test harder to implement. After writing to one of the addresses, you must check that none of the others has been overwritten.
It is important to note that not all of the address lines can be tested in this way. Part of the address-the leftmost bits-selects the memory chip itself. Another part-the rightmost bits-may not be significant if the data bus width is greater than eight bits. These extra bits will remain constant throughout the test and reduce the number of test addresses. For example, if the processor has 32 address bits, it can address up to 4GB of memory. If you want to test a 128K block of memory, the 15 most-significant address bits will remain constant. In that case, only the 17 rightmost bits of the address bus can actually be tested.
To confirm that no two memory locations overlap, you should first write some initial data value at each power-of-two offset within the device. Then write a new value-an inverted copy of the initial value is a good choice-to the first test offset, and verify that the initial data value is still stored at every other power-of-two offset. If you find a location, other than the one just written, that contains the new data value, you have found a problem with the current address bit. If no overlapping is found, repeat the procedure for each of the remaining offsets.

/**********************************************************************
 *
 * Function:    TestAddressBus()
 *
 * Description: Test the address bus wiring in a memory region by
 *              performing a walking 1's test on the relevant bits
 *              of the address and checking for aliasing. This test
 *              will find single-bit address failures such as stuck
 *              -high, stuck-low, and shorted pins.
 *
 * Notes:       For best results, the selected base address should
 *              have enough LSB 0's to guarantee single address bit
 *              changes.  For example, to test a 64-Kbyte region,
 *              select a base address on a 64-Kbyte boundary.  Also,
 *              select the region size as a power-of-two--if at all
 *              possible.
 *
 * Inputs:      
 * Outputs:     test pattern, actual value and error lines and address
 *
 * Returns:     0 if the test succeeds. 
 *                  1 if the test fails.
 *
* ## NOTE ##    Be sure to specify start and end
*              addresses such that addr_mask has
*              lots of bits set. For example an
*              address range of 01000000 02000000 is
*              bad while a range of 01000000
*              01ffffff is perfect.
 **********************************************************************/

int TestAddressBus (u32 *erraddr,unsigned int *errline, u32 *expected, u32 *actual)
{
    vu32     *start, *end;
    vu32     addr_mask;
    vu32     offset;
    vu32     test_offset;
    vu32     pattern;
    vu32     temp;
    vu32     anti_pattern;
    unsigned int lineindex;


    start = (u32 *)(CFG_SDRAM_BASE + (SDRAM_MAX_SIZE >> 1));
    end = (u32 *)(CFG_SDRAM_BASE + SDRAM_MAX_SIZE - 1);

    printf ("Testing addr range: 0x%.8lx ... 0x%.8lx:/n", start, end);

    addr_mask = ((unsigned int)end - (unsigned int)start)/sizeof(vu32 );
    pattern = (vu32 ) 0xaaaaaaaa;
    anti_pattern = (vu32 ) 0x55555555;

    printf("addr mask = 0x%.8lx/n", addr_mask);

    /* Write the default pattern at each of the  logical power-of-two offsets.*/
    for (offset = 1; (offset & addr_mask) != 0; offset <<= 1) {
        start[offset] = pattern;
    }

    /* Check for address bits stuck high or shorted if 0 and 1 gets 0.*/
    test_offset = 0;
    start[test_offset] = anti_pattern;
    lineindex = 2;
    for (offset = 1; (offset & addr_mask) != 0; offset <<= 1, lineindex++) {
        temp = start[offset];
        if (temp != pattern) {
            printf ("FAILURE at address 0x%08lx, bit %d: expected 0x%08lx, actual 0x%08lx/n", &start[offset], lineindex, pattern, temp);
            *expected = pattern;
            *actual = temp;
            *errline = lineindex;
            *erraddr = &start[offset];
            return 1;
        }
    }

    start[test_offset] = pattern;
    /* Now pattern at all logical power-of-two offsets and base */

    /* Check for addr bits stuck low or shorted.*/
    for (test_offset = 1; (test_offset & addr_mask) != 0; test_offset <<= 1) {
        start[test_offset] = anti_pattern;

        lineindex = 2;
        /* Check for addr bits stuck low or shorted if 0 and 1 gets 0.*/
        temp = start[0];
        if (temp != pattern) {
            printf ("FAILURE at address 0x%08lx, bit %d: expected 0x%08lx, actual 0x%08lx/n", &start[offset], lineindex, pattern, temp);
            *expected = pattern;
            *actual = temp;
            *errline = lineindex;
            *erraddr = &start[offset];
            return 1;
        }

        /* Check for addr bits shorted no matter what 0 and 1 gets when connected.*/
        for (offset = 1; (offset & addr_mask) != 0; offset <<= 1, lineindex++) {
            temp = start[offset];
            if ((temp != pattern) && (offset != test_offset)) {
                printf ("FAILURE at address 0x%08lx, bit %d: expected 0x%08lx, actual 0x%08lx/n", &start[offset], lineindex, pattern, temp);
                *expected = pattern;
                *actual = temp;
                *errline = lineindex;
                *erraddr = &start[offset];
                return 1;
            }
        }

        /* restore pattern at all logical power-of-two offsets */
        start[test_offset] = pattern;
    }

    return 0;

}
List 2 Address bus test

7    Integrity test
Once you know that the address and data bus wiring are working, it is necessary to test the integrity of the memory device itself. The thing to test is that every bit in the device is capable of holding both 0 and 1. This is a fairly straightforward test to implement, but takes significantly longer to execute than the previous two.
For a complete integrity test, you must visit (write and verify) every memory location twice. You are free to choose any data value for the first pass, so long as you invert that value during the second. And since there is a possibility of missing memory chips, it is best to select a set of data that changes with (but is not equivalent to) the address. A simple example is an "increment test."
Offset    Value    Inverted Value
00h    00000001    11111110
01h    00000010    11111101
02h    00000011    11111100
03h    00000100    11111011
...    ...    ...
FEh    11111111    00000000
FFh    00000000    11111111
Table 2. Data values for an increment test
The offsets and corresponding data values for the increment test are shown in the first two columns of Table 2. The third column shows the inverted data values used during the second pass of this test. The latter represents a decrement test. There are many other possible choices of data, but the incrementing data pattern is adequate and easy to compute.

/**********************************************************************
 *
 * Function:    TestIntegrity()
 *
 * Description: Test the integrity of a physical memory device by
 *              performing an increment/decrement test over the
 *              entire region.  In the process every storage bit
 *              in the device is tested as a zero and a one.  The
 *              base address and the size of the region are
 *              selected by the caller.
 *
 * Notes:      
 *
  * Inputs:      start addr and end addr
 * Outputs:     test pattern, actual value, error address
 *
 * Returns:     0 if the test succeeds. 
 *                  1 if the test fails.
 *
 **********************************************************************/
int TestIntegrity (unsigned int *erraddr, unsigned int *expected, unsigned int *actual, vu32  *start, vu32  *end)
{
    vu32     offset;
    vu32     pattern;
    vu32     temp;
    vu32     anti_pattern;
    u32     num_words;

    printf ("Testing integrity: 0x%08x ... 0x%08x:/n", start, end);

    num_words = ((unsigned int)end - (unsigned int)start)/sizeof(vu32 ) + 1;

    /* Fill memory with a known pattern.*/
    for (pattern = 1, offset = 0; offset < num_words; pattern++, offset++) {
        start[offset] = pattern;
    }

    /* Check each location and invert it for the second pass. */
    for (pattern = 1, offset = 0; offset < num_words; pattern++, offset++) {
        temp = start[offset];
        if (temp != pattern) {
            printf ("FAILURE at address 0x%08lx: expected 0x%08lx, actual 0x%08lx/n", &start[offset], pattern, temp);
            *expected = pattern;
            *actual = temp;
            *erraddr = &start[offset];
            return 1;
        }

        anti_pattern = ~pattern;
        start[offset] = anti_pattern;
    }


    /* Check each location for the inverted pattern and zero it.*/
    for (pattern = 1, offset = 0; offset < num_words; pattern++, offset++) {
        anti_pattern = ~pattern;
        temp = start[offset];
        if (temp != anti_pattern) {
            printf ("FAILURE at address 0x%08lx: expected 0x%08lx, actual 0x%08lx/n", &start[offset], anti_pattern, temp);
            *expected = anti_pattern;
            *actual = temp;
            *erraddr = &start[offset];
            return 1;
        }

        start[offset] = 0;
    }

    return 0;

}
List 3. Integrity test

8    Flash Test
Of course, a memory test like the one just described above is necessarily destructive. In the process of testing the memory, you must overwrite its prior contents. Since it is usually impractical to overwrite the contents of nonvolatile memories, the tests described in this article are generally used only for RAM testing. However, if the contents of a non-volatile memory device, like flash and NVRAM is backed up before test and restored after test, these same algorithms can be used to test those devices as well.
But flash operation is not as simple as RAM R/W, and it needs drivers for writing and erase. A complicated procedure exists during Flash initialization stage, and part of data buses and address buses have actually been used, so if there is anything wrong, program maybe cannot run at all.
Since 0 can be changed to 1 only by erase, so sector should be erased before any writing to the same location. Besides, protection should be cancelled before test and restored after test if some sectors are protected.
If several locations are in the same sector, just erase once is OK in order to save time. Since not all sectors are the same size, method is needed to calculate which sector the location exists in.

9    Data Width
All the test pattern and address pointer all closely related with data width. So for different data width memory, minor modification is needed in codes above. Just replace u32 with corresponding data type is OK. For example, for 16bit Flash, use u16 instead of u32.
For 64bit SDRAM, since << operand is just applicable for integer, but on most embedded CPU, there is no 64bit integer except 64bit double. So for 64bit data bus test, the same test can be executed on both even and odd 32bit address to check all 64bit data bus.

10    Where to execute test code
Unfortunately, it is not always possible to write memory tests in a high-level language. For example, the C language requires the use of a stack. But a stack itself requires working RAM. This might be reasonable in a system with more than one memory device. For example, you might create a stack in an area of RAM that is already known to be working, while testing another memory device.
For most embedded CPU, there is internal integrated SRAM or DPRAM like AT91RM9200, S3C44B0 and MCP8270, and this RAM can be working without any initialization at reset, because CPU access it through internal data and address buses and there is no possibility of wiring problems. So initial stack can be setup in these kinds of RAM. Of course, you can not run test program in RAM to be tested; otherwise there is not any need to test it since program can already run in it!
If you cannot assume enough working RAM for the stack and data needs of the test program, then you will need to rewrite these memory test routines entirely in assembly language.
Generally boot code are stored in Nor Flash, and program will execute from Flash on reset; then if there is something wrong with Flash, program can not run at all, so it's not easy to identify the problem. For CPU with internal working RAM, the option is to load memory test program through emulator to internal RAM and then run it. Even if there is not internal RAM, emulator can initialize external RAM as long as there is nothing wrong with it. This method is very effective to diagnose Flash hardware problems during development stage but not applicable for products on the market.
Some CPU can boot from internal ROM, and this mode is generally used for downloading program for the first time when there is nothing in Flash or no emulator to program Flash. So test program can be loaded to internal RAM to test Flash.
While some CPU can boot from IIC EEPROM, generally this is a small portion of boot code, and it will download the actual program from host to RAM. If there is any NOR or NAND Flash, it is most probably used to store data. 
The need for memory testing is most apparent during product development stage, when the reliability of the hardware and its design are still unproven. However, memory is one of the most critical resources in any embedded system, so it may also be desirable to include a memory test in the final release of your software. In that case, the memory test should be run each time the system is powered-on or reset.

原创粉丝点击