C语言字符串char *和char[]——摘自《C primer plus》

来源:互联网 发布:mac鼠标右键功能 编辑:程序博客网 时间:2024/06/06 07:12

Defining Strings Within a Program

As you probably noticed when you read Listing 11.1, there are many ways to define a string. The principal ways are using string constants, using char arrays, using char pointers, and using arrays of character strings. A program should make sure there is a place to store a string, and we will cover that topic, too.

Character String Constants (String Literals)

string constant, also termed a string literal, is anything enclosed in double quotation marks. The enclosed characters, plus a terminating \0 character automatically provided by the compiler, are stored in memory as a character string. The program uses several such character string constants, most often as arguments for the printf() and puts()functions. Note, too, that you can use #define to define character string constants.

Recall that ANSI C concatenates string literals if they are separated by nothing or by whitespace. For example,


char greeting[50] = "Hello, and"" how are"  " you"                 " today!";

is equivalent to this:


char greeting[50] = "Hello, and how are you today!";

If you want to use a double quotation mark within a string, precede the quotation mark with a backslash, as follows:


printf("\"Run, Spot, run!\" exclaimed Dick.\n");

This produces the following output:


"Run, Spot, run!" exclaimed Dick.

Character string constants are placed in the static storage class, which means that if you use a string constant in a function, the string is stored just once and lasts for the duration of the program, even if the function is called several times. The entire quoted phrase acts as a pointer to where the string is stored. This action is analogous to the name of an array acting as a pointer to the array's location. If this is true, what kind of output should the program in Listing 11.2 produce?

Listing 11.2. The quotes.c Program
/* quotes.c -- strings as pointers */#include <stdio.h>int main(void){    printf("%s, %p, %c\n", "We", "are", *"space farers");    return 0;}

The %s format should print the string We. The %p format produces an address. So if the phrase "are" is an address, then %p should print the address of the first character in the string. (Pre-ANSI implementations might have to use %u or %lu instead of %p.) Finally, *"space farers" should produce the value of the address pointed to, which should be the first character of the string "space farers". Does this really happen? Well, here is the output:


We, 0x0040c010, s

Character String Arrays and Initialization

When you define a character string array, you must let the compiler know how much space is needed. One way is to specify an array size large enough to hold the string. The following declaration initializes the array m1 to the characters of the indicated string:


const char m1[40] = "Limit yourself to one line's worth.";

The const indicates the intent to not alter this string.

This form of initialization is short for the standard array initialization form:


const char m1[40] = {  'L','i', 'm', 'i', 't', ' ', 'y', 'o', 'u', 'r', 's', 'e', 'l','f', ' ', 't', 'o', ' ', 'o', 'n', 'e', ' ','l', 'i', 'n', 'e', '\", 's', ' ', 'w', 'o', 'r','t', 'h', '.', '\0'};

Note the closing null character. Without it, you have a character array, but not a string.

When you specify the array size, be sure that the number of elements is at least one more (that null character again) than the string length. Any unused elements are automatically initialized to 0 (which in char form is the null character, not the zero digit character). See Figure 11.1.

Figure 11.1. Initializing an array.

graphics/11fig01.gif


Often, it is convenient to let the compiler determine the array size; recall that if you omit the size in an initializing declaration, the compiler determines the size for you:


const char m2[] = "If you can't think of anything, fake it.";

Initializing character arrays is one case when it really does make sense to let the compiler determine the array size. That's because string-processing functions typically don't need to know the size of the array because they can simply look for the null character to mark the end.

Note that the program had to assign a size explicitly for the array name:


#define LINELEN 81        // maximum string length + 1...char name[LINELEN];

Because the contents for name are to be read when the program runs, the compiler has no way of knowing in advance how much space to set aside unless you tell it. There is no string constant present whose characters the compiler can count, so we gambled that 80 characters would be enough to hold the user's name. When you declare an array, the array size must evaluate to an integer constant. You can't use a variable that gets set at runtime. The array size is locked into the program at compile time. (Actually, with C99 you could use a variable-length array, but you still have no way of knowing in advance how big it has to be.)


int n = 8;char cakes[2 + 5];  /* valid, size is a constant expressionchar crumbs[n];     /* invalid prior to C99, a VLA after C99

The name of a character array, like any array name, yields the address of the first element of the array. Therefore, the following holds for the array m1:


m1 == &m1[0] , *m1 == 'L', and *(m1+1) == m1[1] == 'i'

Indeed, you can use pointer notation to set up a string. For example, Listing 11.1 uses the following declaration:


const char *m3 = "\nEnough about me -- what's your name?";

This declaration is very nearly the same as this one:


char m3[] = "\nEnough about me -- what's your name?"

Both declarations amount to saying that m3 is a pointer to the indicated string. In both cases, the quoted string itself determines the amount of storage set aside for the string. Nonetheless, the forms are not identical.

Array Versus Pointer

What is the difference, then, between an array and a pointer form? The array form (m3[]) causes an array of 38 elements (one for each character plus one for the terminating'\0') to be allocated in the computer memory. Each element is initialized to the corresponding character. Typically, what happens is that the quoted string is stored in a data segment that is part of the executable file; when the program is loaded into memory, so is that string. The quoted string is said to be in static memory. But the memory for the array is allocated only after the program begins running. At that time, the quoted string is copied into the array. (Chapter 12, "Storage Classes, Linkage, and Memory Management," will discuss memory management more fully.) Hereafter, the compiler will recognize the name m3 as a synonym for the address of the first array element, &m3[0]. One important point here is that in the array form, m3 is an address constant. You can't change m3, because that would mean changing the location (address) where the array is stored. You can use operations such as m3+1 to identify the next element in an array, but ++m3 is not allowed. The increment operator can be used only with the names of variables, not with constants.

The pointer form (*m3) also causes 38 elements in static storage to be set aside for the string. In addition, once the program begins execution, it sets aside one more storage location for the pointer variable m3 and stores the address of the string in the pointer variable. This variable initially points to the first character of the string, but the value can be changed. Therefore, you can use the increment operator. For instance, ++m3 would point to the second character (E).

In short, initializing the array copies a string from static storage to the array, whereas initializing the pointer merely copies the address of the string.

Are these differences important? Often they are not, but it depends on what you try to do. See the following discussion for some examples.

Array and Pointer Differences

Let's examine the differences between initializing a character array to hold a string and initializing a pointer to point to a string. (By "pointing to a string," we really mean pointing to the first character of a string.) For example, consider these two declarations:


char heart[] = "I love Tillie!";char *head = "I love Millie!";

The chief difference is that the array name heart is a constant, but the pointer head is a variable. What practical difference does this make?

First, both can use array notation:


for (i = 0; i < 6; i++)    putchar(heart[i]);putchar('\n');for (i = 0; i < 6; i++)    putchar(head[i]));putchar('\n');

This is the output:


I loveI love

Next, both can use pointer addition:


for (i = 0; i < 6; i++)    putchar(*(heart + i));putchar('\n');for (i = 0; i < 6; i++)    putchar(*(head + i));putchar('\n');

Again, the output is as follows:


I loveI love

Only the pointer version, however, can use the increment operator:


while (*(head) != '\0')  /* stop at end of string            */    putchar(*(head++));  /* print character, advance pointer */

This produces the following output:


I love Millie!

Suppose you want head to agree with heart. You can say


head = heart;  /* head now points to the array heart */

This makes the head pointer point to the first element of the heart array.

However, you cannot say


heart = head;  /* illegal construction */

The situation is analogous to x = 3; versus 3 = x;. The left side of the assignment statement must be a variable or, more generally, an lvalue, such as *p_int. Incidentally, head = heart; does not make the Millie string vanish; it just changes the address stored in head. Unless you've saved the address of "I love Millie!" elsewhere, however, you won't be able to access that string when head points to another location.

There is a way to alter the heart message梘o to the individual array elements:


heart[7]= 'M';

or


*(heart + 7) = 'M';

The elements of an array are variables (unless the array was declared as const), but the name is not a variable.

Let's go back to a pointer initialization:


char * word = "frame";

Can you use the pointer to change this string?


word[1] = 'l';  // allowed??

Your compiler probably will allow this, but, under the current C standard, the behavior for such an action is undefined. Such a statement could, for example, lead to memory access errors. The reason is that a compiler can choose to represent all identical string literals with a single copy in memory. For example, the following statements could all refer to a single memory location of string "Klingon":


char * p1 = "Klingon";p1[0] = 'F';    // ok?printf("Klingon");printf(": Beware the %ss!\n", "Klingon");

That is, the compiler can replace each instance of "Klingon" with the same address. If the compiler uses this single-copy representation and allows changing p1[0] to 'F', that would affect all uses of the string, so statements printing the string literal "Klingon" would actually display "Flingon":


Flingon: Beware the Flingons!

In fact, several compilers do behave this rather confusing way, whereas others produce programs that abort. Therefore, the recommended practice for initializing a pointer to a string literal is to use the const modifier:


const char * pl = "Klingon";  // recommended usage

Initializing a non-const array with a string literal, however, poses no such problems, because the array gets a copy of the original string.