Pointers and Memory

来源：互联网发布：武汉大学网络编辑：程序博客网时间：2024/05/18 00:54

Pointers and Memory（指针和内存）1

irresistibly：无法抵抗地，不能自持地; 极为诱惑人地

programming construct：编程结构

back and forth：来回地

intuitively：直观地

reference：引用

pointee：指针指向的对象

dereference：间接引用，间接访问

manipulation：操纵

unary：一元的

Allocation And Deallocation：分配和释放

lexical scoping：词法作用域

caller：呼叫者;召集者;访问者;打电话者

callee：被叫;被召者;受话人

Why Have Pointers（为什么要用指针）
Pointers solve two common software problems. First, pointers allow different sections of code to share information easily. You can get the same effect by copying information back and forth, but pointers solve the problem better. Second, pointers enable complex “linked” data structures like linked lists and binary trees.
The NULL Pointer（空指针）
The constant NULL is a special pointer value which encodes the idea of “points to nothing.” It turns out to be convenient to have a well defined pointer value which represents the idea that a pointer does not have a pointee. It is a runtime error to dereference a NULL pointer.
The C language uses the symbol NULL for this purpose. NULL is equal to the integer constant 0, so NULL can play the role of a boolean false. Official C++ no longer uses the NULL symbolic constant — use the integer constant 0 directly. Java uses the symbol null.
Shallow and Deep Copying（浅拷贝和深拷贝）
In particular, sharing can enable communication between two functions. One function passes a pointer to the value of interest to another function. Both functions can access the value of interest, but the value of interest itself is not copied. This communication is called “shallow” since instead of making and sending a (large) copy of the value of interest, a (small) pointer is sent and the value of interest is shared. The recipient needs to understand that they have a shallow copy, so they know not to change or delete it since it is shared. The alternative where a complete copy is made and sent is known as a “deep” copy. Deep copies are simpler in a way, since each function can change their copy without interfering with the other copy, but deep copies run slower because of all the copying.
Bad Pointers（坏指针）
When a pointer is first allocated, it does not have a pointee. The pointer is “uninitialized” or simply “bad”. A dereference operation on a bad pointer is a serious runtime error. Each pointer must be assigned a pointee before it can support dereference operations.
Pointer Type Syntax
A pointer type in C is just the pointee type followed by a asterisk (*)…

pointer type type int* type: pointer to int float* type: pointer to float struct fraction* type: pointer to struct fraction struct fraction** type: pointer to struct fraction*

6. Pointer Rules Summary

No matter how complex a pointer structure gets, the list of rules remains short.

A pointer stores a reference to its pointee. The pointee, in turn, stores something useful.
The dereference operation on a pointer accesses its pointee. A pointer may only be dereferenced after it has been assigned to refer to a pointee. Most pointer bugs involve violating this one rule.
Allocating a pointer does not automatically assign it to refer to a pointee. Assigning the pointer to refer to a specific pointee is a separate operation which is easy to forget.
Assignment between two pointers makes them refer to the same pointee which introduces sharing.
1. The Term “Reference”
The word “reference” means almost the same thing as the word “pointer”. The difference is that “reference” tends to be used in a discussion of pointer issues which is not specific to any particular language or implementation. The word “pointer” connotes the common C/C++ implementation of pointers as addresses. The word “reference” is also used in the phrase “reference parameter” which is a technique which uses pointer parameters for two-way communication between functions.
1. Allocation And Deallocation（分配和释放）
The terminology is that a variable is allocated when it is given an area of memory to store its value. While the variable is allocated, it can operate as a variable in the usual way to hold a value. A variable is deallocated when the system reclaims the memory from the variable, so it no longer has an area to store its value. For a variable, the period of time from its allocation until its deallocation is called its lifetime.
1. Synonyms For “Local”
Local variables are also known as “automatic” variables since their allocation and
deallocation is done automatically as part of the function call mechanism. Local variables are also sometimes known as “stack” variables because, at a low level, languages almost always implement local variables using a stack structure in memory.
1. Extra: How Does The Function Call Stack Work?
You do not need to know how local variables are implemented during a function call, but here is a rough outline of the steps if you are curious. The exact details of the implementation are language and compiler specific. However, the basic structure below is approximates the method used by many different systems and languages…
To call a function such as foo(6, x+1)
1. Evaluate the actual parameter expressions, such as the x+1, in the caller’s
  context.
2. Allocate memory for foo()’s locals by pushing a suitable “local block” of memory onto a runtime “call stack” dedicated to this purpose. For parameters but not local variables, store the values from step (1) into the appropriate slot in foo()’s local block.
3. Store the caller’s current address of execution (its “return address”) and switch execution to foo().
4. foo() executes with its local block conveniently available at the end of the call stack.
5. When foo() is finished, it exits by popping its locals off the stack and “returns” to the caller using the previously stored return address. Now the caller’s locals are on the end of the stack and it can resume executing.
For the extremely curious, here are other miscellaneous notes on the function call process…
- This is why infinite recursion results in a “Stack Overflow Error” — the code keeps calling and calling resulting in steps (1) (2) (3), (1) (2) (3), but never a step (4)….eventually the call stack runs out of memory.
- This is why local variables have random initial values — step (2) just pushes the whole local block in one operation. Each local gets its own area of memory, but the memory will contain whatever the most recent tenant left there. To clear all of the local block for each function call would be too time expensive.
- The “local block” is also known as the function’s “activation record” or “stack frame”. The entire block can be pushed onto the stack (step 2), in a single CPU operation — it is a very fast operation.
- For a multithreaded environment, each thread gets its own call stack instead of just having single, global call stack.
- For performance reasons, some languages pass some parameters through registers and others through the stack, so the overall process is complex. However, the apparent the lifetime of the variables will always follow the “stack” model presented here.
  1. Heap Memory（堆内存）
“Heap” memory, also known as “dynamic” memory, is an alternative to local stack memory. Local memory is quite automatic — it is allocated automatically on function call and it is deallocated automatically when a function exits. Heap memory is different in every way. The programmer explicitly requests the allocation of a memory “block” of a particular size, and the block continues to be allocated until the programmer explicitly requests that it be deallocated. Nothing happens automatically. So the programmer has much greater control of memory, but with greater responsibility since the memory must now be actively managed.
- Allocation
  The heap is a large area of memory available for use by the program. The program can request areas, or “blocks”, of memory for its use within the heap. In order to allocate a block of some size, the program makes an explicit request by calling the heap allocation function. The allocation function reserves a block of memory of the requested size in the heap and returns a pointer to it.
  Each allocation request reserves a contiguous area of the requested size in the heap and returns a pointer to that new block to the program. The heap block pointers are
  sometimes known as “base address” pointers since by convention they point to the base (lowest address byte) of the block.
- Deallocation
  When the program is finished using a block of memory, it makes an explicit deallocation request to indicate to the heap manager that the program is now finished with that block. The heap manager updates its private data structures to show that the area of memory occupied by the block is free again and so may be re-used to satisfy future allocation requests.
  1. Programming The Heap
Programming the heap looks pretty much the same in most languages. The basic features are….
- The heap is an area of memory available to allocate areas (“blocks”) of memory for the program.
- There is some “heap manager” library code which manages the heap for the program. The programmer makes requests to the heap manager, which in turn manages the internals of the heap. In C, the heap is managed by the ANSI library functions malloc(), free(), and realloc().
- The heap manager uses its own private data structures to keep track of which blocks in the heap are “free” (available for use) and which blocks are currently in use by the program and how large those blocks are. Initially, all of the heap is free.
- The heap may be of a fixed size (the usual conceptualization), or it may appear to be of a fixed but extremely large size backed by virtual memory. In either case, it is possible for the heap to get “full” if all of its memory has been allocated and so it cannot satisfy an allocation request. The allocation function will communicate this run-time condition in some way to the program — usually by returning a NULL pointer or raising a language specific run-time exception.
- The allocation function requests a block in the heap of a particular size. The heap manager selects an area of memory to use to satisfy the request, marks that area as “in use” in its private data structures, and returns a pointer to the heap block. The caller is now free to use that memory by dereferencing the pointer. The block is guaranteed to be reserved for the sole use of the caller — the heap will not hand out that same area of memory to some other caller. The block does not move around inside the heap — its location and size are fixed once it is allocated. Generally, when a block is allocated, its contents are random. The new owner is responsible for setting the memory to something meaningful. Sometimes there is variation on the memory allocation function which sets the block to all zeros (calloc() in C).
- The deallocation function is the opposite of the allocation function. The program makes a single deallocation call to return a block of memory to the heap free area for later re-use. Each block should only be deallocated once. The deallocation function takes as its argument a pointer to a heap block previously furnished by the allocation function. The pointer must be exactly the same pointer returned earlier by the allocation function, not just any pointer into the block. After the deallocation, the program must treat the pointer as bad and not access the deallocated pointee.
  1. Memory Leaks
A program which forgets to deallocate a block is said to have a “memory leak” which may or may not be a serious problem. The result will be that the heap gradually fill up as there continue to be allocation requests, but no deallocation requests to return blocks for re-use.
Memory leaks are more of a problem for a program which runs for an indeterminate amount of time. In that case, the memory leaks can gradually fill the heap until allocation requests cannot be satisfied, and the program stops working or crashes.

http://cslibrary.stanford.edu/102/
[^2]: http://cslibrary.stanford.edu/ ↩

阅读全文

0 0