Data References and Anonymous St…

来源：互联网发布：网络限速上行多少合适编辑：程序博客网时间：2024/05/16 03:28

Chapter 1: Data References and Anonymous Storage

http://oreilly.com/catalog/advperl/excerpt/ch01.html
In this chapter:

Referring to Existing Variables
Using References
Nested Data Structures
Querying a Reference
Symbolic References
A View of the Internals
References in Other Languages
Resources

If I were meta-agnostic, I'd be confused over whether I'magnostic or not but I'm not quite sure if I feel that way; hence Imust be meta-metagagnostic (Iguess).
--Douglas R. Hofstadter, G�del, Escher, Bach

There are two aspects (among many) that distinguish toyprogramming languages from those used to build truly complexsystems. The more robust languages have:

The ability to dynamically allocate data structures withouthaving to associate them with variable names. We refer to these as"anonymous" data structures.
The ability to point to any data structure, independent ofwhether it is allocated dynamically or statically.

COBOL is the one true exception to this; it has been a hugecommercial success in spite of lacking these features. But it isalso why you'd balk at developing flight control systems in COBOL.Consider the following statements that describe a far simplerproblem: a family tree.

Marge is 23 years old and is married to John, 24.
Jason, John's brother, is studying computer science at MIT. Heis just 19.
Their parents, Mary and Robert, are both sixty and live inFlorida.
Mary and Marge's mother, Agnes, are childhood friends.

Do you find yourself mentally drawing a network with bubblesrepresenting people and arrows representing relationships betweenthem? Think of how you would conveniently represent this kind ofinformation in your favorite programgming language. If you were a C(or Algol, Pascal, or C++) programmer, you would use a dynamicallyallocated data structure to represent each person's data (name,age, and location) and pointers to represent relationships betweenpeople.

A pointer is simply a variable that contains the location ofsome other piece of data. This location can be a machine address,as it is in C, or a higher-level entity, such as a name or an arrayoffset.

C supports both aspects extremely efficiently: You use malloc(3)to allocate memory dynamically and a pointer to refer todynamically and statically allocated memory. While this is asefficient as it gets, you tend to spend enormous amounts of timedealing with memory management issues, carefully setting up andmodifying complex interrelationships between data, and thendebugging fatal errors resulting from "dangling pointers" (pointersreferring to pieces of memory that have been freed or are no longerin scope). The program may be efficient; the programmer isn't.

Perl supports both concepts, and quite well, too. It allows youto create anonygmous data structures, and supports a fundamentaldata type called a "reference," loosely equivalent to a C pointer.Just as C pointers can point to data as well as procedures, Perl'sreferences can refer to conventional data types (scalars, arrays,and hashes) and other entities such as subroutines, typeglobs, andfilehandles. Unlike C, they don't let you peek and poke at rawmemory locations.

Perl excels from the standpoint of programmer efficiency. As wesaw earlier, you can create complex structures with very few linesof code because, unlike C, Perl doesn't expect you to spell outevery thing. A line like this:

$line[19] = "hello";

does in one line what amounts to quite a number of lines inC-allocating a dynamic array of 20 elements and setting the lastelement to a (dynamically allocated) string. Equally important, youdon't spend any time at all thinking about memory managementissues. Perl ensures that a piece of data is deleted when no one ispointing at it any more (that is, it ensures that there are nomemory leaks) and, conversely, that it is not deleted when someoneis still pointing to it (no dangling pointers).

Of course, just because all this can be done does not mean thatPerl is an autogmatic choice for implementing complex applicationssuch as aircraft scheduling systems. However, there is no dearth ofother, less complex applications (not just throwaway scripts) forwhich Perl can more easily be used than any other language.

In this chapter, you will learn the following:

How to create references to scalars, arrays, and hashes and howto access data through them (dereferencing).
How to create and refer to anonymous data structures.
What Perl does internally to help you avoid thinking aboutmemory managegment.

Referring to Existing Variables

If you have a C background (not necessary for understanding thischapter), you know that there are two ways to initialize a pointerin C. You can refer to an existing variable:

int a, *p;p = &a;

The memory is statically allocated; that is, it is allocated by thecompiler. Alternagtively, you can use malloc(3) to allocate a pieceof memory at run-time and obtain its address:

p = malloc(sizeof(int));

This dynamically allocated memory doesn't have a name (unlike thatassociated with a variable); it can be accessed only indirectlythrough the pointer, which is why we refer to it as "anonymousstorage."

Perl provides references to both statically and dynamicallyallocated storage; in this section, we'll the study the former insome detail. That allows us to deal with the twoconcepts--references and anonymous storage--separately.

You can create a reference to an existing Perl variable byprefixing it with a backgslash, like this:

# Create some variables$a      = "mama mia";@array  = (10, 20);%hash   = ("laurel" => "hardy", "nick" =>  "nora");# Now create references to them$ra     = \$a;          # $ra now "refers" to (points to) $a$rarray = \@array;$rhash  = \%hash;

You can create references to constant scalars in a similar fashion:

$ra     = \10;$rs     = \"hello world";

That's all there is to it. Since arrays and hashes are collectionsof scalars, it is possible to take a reference to an individualelement the same way: just prefix it with a backslash:

$r_array_element = \$array[1];       # Refers to the scalar $array[1]$r_hash_element  = \$hash{"laurel"}; # Refers to the scalar                                     # $hash{"laurel"}

A Reference Is Just Another Scalar

A reference variable, such as $ra or $rarray, is an ordinaryscalar-hence the prefix `$'. A scalar, in other words, can be anumber, a string, or a reference and can be freely reassigned toone or the other of these (sub)types. If you print a scalar whileit is a reference, you get something like this:

SCALAR(0xb06c0)

While a string and a number have direct printed representations, areference doesn't. So Perl prints out whatever it can: the type ofthe value pointed to and its memory address. There is rarely areason to print out a reference, but if you have to, Perl suppliesa reasonable default. This is one of the things that makes Perl soproductive to use. Don't just sit there and complain, do something.Perl takes this motherly advice seriously.

While we are on the subject, it is important that you understandwhat happens when references are used as keys for hashes. Perlrequires hash keys to be strings, so when you use a reference as akey, Perl uses the reference's string representation (which will beunique, because it is a pointer value after all). But when youlater retrieve the key from this hash, it will remain a string andwill thus be unusable as a reference. It is possible that a futurerelease of Perl may lift the restriction that hash keys have to bestrings, but for the moment, the only recourse to this problem isto use the Tie::RefHash module presented in Chapter 9, Tie. I mustadd that this restriction is hardly debilitating in the largerscheme of things. There are few algorithms that require referencesto be used as hash keys and fewer still that cannot live with thisrestriction.

Dereferencing

Dereferencing means getting at the value that a reference pointsto.

In C, if p is a pointer, *p refers to the value being pointedto. In Perl, if $r is a reference, then $$r, @$r, or %$r retrievesthe value being referred to, depending on whether $r is pointing toa scalar, an array, or a hash. It is essential that you use thecorrect prefix for the corresponding type; if $r is pointing to anarray, then you must use @$r, and not %$r or $$r. Using the wrongprefix results in a fatal run-time error.

Think of it this way: Wherever you would ordinarily use a Perlvariable ($a, @b, or %c), you can replace the variable's name (a,b, or c) by a reference variable (as long as the reference is ofthe right type). A reference is usable in all the places where anordinary data type can be used. The following examples show howreferences to different data types are dereferenced.

References to Scalars

The following expressions involving a scalar,

$a += 2;print $a;          # Print $a's contents ordinarilycan be changed to use a reference by simply replacing the string "a" by the string "$ra":$ra = \$a;         # First take a reference to $a$$ra  += 2;        # instead of $a += 2; print $$ra;        # instead of print $a

Of course, you must make sure that $ra is a reference pointing to ascalar; otherwise, Perl dies with the run-time error "Not a SCALARreference".

References to Arrays

You can use ordinary arrays in three ways:

Access the array as a whole, using the @array notation. You canprint an entire array or push elements into it, for example.
Access single elements using the $array[$i] notation.
Access ranges of elements (slices), using the notation@array[index1,index2,...].

References to arrays are usable in all three of these situations.The following code shows an example of each, contrasting ordinaryarray usage to that using refer- ences to arrays:

$rarray = \@array;push (@array , "a", 1, 2);   # Using the array as a wholepush (@$rarray, "a", 1, 2);  # Indirectly using the ref. to the arrayprint $array[$i] ;           # Accessing single elements print $$rarray[1];           # Indexing indirectly through a                              # reference: array replaced by $rarray@sl =  @array[1,2,3];        # Ordinary array slice@sl =  @$rarray[1,2,3];      # Array slice using a reference

Note that in all these cases, we have simply replaced the stringarray with $rarray to get the appropriate indirection.

Beginners often make the mistake of confusing array variablesand enumerated (comma-separated) lists. For example, putting abackslash in front of an enumer- ated list does not yield areference to it:

$s = \('a', 'b', 'c');      # WARNING: probably not what you thinkAs it happens, this is identical to$s = (\'a', \'b', \'c');    # List of references to scalars

An enumerated list always yields the last element in a scalarcontext (as in C), which means that $s contains a reference to theconstant string c. Anonymous arrays, discussed later in the section"References to Anonymous Storage," provide the correct solution.

References to Hashes

References to hashes are equally straightforward:

$rhash = \%hash;print $hash{"key1"};        # Ordinary hash lookupprint $$rhash{"key1"};      # hash replaced by $rhashHash slices work the same way too:@slice = @$rhash{'key1', 'key2'}; # instead of @hash{'key1', 'key2'}

A word of advice: You must resist the temptation to implement basicdata structures such as linked lists and trees just because apointerlike capability is available. For small numbers of elements,the standard array data type has pretty decent insertion andremoval performance characteristics and is far less resourceintensive than linked lists built using Perl primitives. (On mymachine, a small test shows that inserting up to around 1250elements at the head of a Perl array is faster than creating anequivalent linked list.) And if you want BTrees, you should look atthe Berkeley DB library (described in Chapter 10, Persistence)before rolling a Perl equivalent.

Confusion About Precedence

The expressions involving key lookups might cause some confusion.Do you read $$rarray[1] as ${$rarray[1]} or {$$rarray}[1] or${$rarray}[1]?

(Pause here to give your eyes time to refocus!)

As it happens, the last one is the correct answer. Perl followsthese two simple rules while parsing such expressions: (1) Key orindex lookups are done at the end, and (2) the prefix closest to avariable name binds most closely. When Perl sees something like$$rarray[1] or $$rhash{"browns"}, it leaves index lookups ([1] and{"browns"}) to the very end. That leaves $$rarray and $$rhash. Itgives preference to the `$' closest to the variable name. So theprecedence works out like this: ${$rarray} and ${$rhash}. Anotherway of visualizing the second rule is that the preference is givento the symbols from right to left (the variable is always to theright of a series of symbols).

Note that we are not really talking about operator precedence,since $, @ , and % are not operators; the rules above indicate theway an expression is parsed.

Shortcuts with the Arrow Notation

Perl provides an alternate and easier-to-read syntax for accessingarray or hash elements: the ->[ ] notation. Forexample, given the array's reference, you can obtain the secondelement of the array like this:

$rarray = \@array;print $rarray->[1] ;    # The "visually clean" wayinstead of the approaches we have seen earlier:print $$rarray[1];      # Noisy, and have to think about precedenceprint ${$rarray}[1];    # The way to get tendinitis!

I prefer the arrow notation, because it is less visually noisy.Figure 1-1 shows a way to visualize this notation.

Data References and Anonymous Storage [转]
Figure 1-1: Visualizing $rarray ->[1]

Similarly, you can use the ->{ } notation toaccess an element of a hash table:

$rhash = \%hash;print $rhash->{"k1"};    #instead of ........print $$rhash{"k1"};# or print ${$rhash}{"k1"};

Caution: This notation works only for single indices, not forslices. Consider the following:

print $rarray->[0,2]; # Warning: This is NOT an indirect array slice.

Perl treats the stuff within the brackets as a comma-separatedexpression that yields the last term in the array: 2. Hence, thisexpression is equivalent to $rarray->[2], which isan index lookup, not a slice. (Recall the rule mentioned earlier:An enumerated or comma-separated list always returns the lastelement in a scalar context.)

No Automatic Dereferencing

Perl does not do any automatic dereferencing for you. You mustexplicitly dereference using the constructs just described. This issimilar to C, in which you have to say *p to indicate the objectpointed to by p. Consider

$rarray = \@array;push ($rarray,  1, 2, 3);   # Error: $rarray is a scalar, not an arraypush (@$rarray, 1, 2, 3);   # OK

push expects an array as the first argument, not a reference to anarray (which is a scalar). Similarly, when printing an array, Perldoes not automatically deference any references. Consider

print "$rarray, $rhash";

This prints

ARRAY(0xc70858), HASH(0xb75ce8)

This issue may seem benign but has ugly consequences in two cases.The first is when a reference is used in an arithmetic orconditional expression by mistake; for example, if you said $a +=$r when you really meant to say $a += $$r, you'll get only ahard-to-track bug. The second common mistake is assigning an arrayto a scalar ($a = @array) instead of the array reference ($a =\@array). Perl does not warn you in either case, and Murphy's lawbeing what it is, you will discover this problem only when you aregiving a demo to a customer.

0 0

Data&nbsp;References&nbsp;and&nbsp;Anonymous&nbsp;St…