Python Memory Management

来源：互联网发布：淘宝中国历代名家画集编辑：程序博客网时间：2024/06/04 20:00

Even though some high level languages handle memory automatically doesn't mean that memory leaks are impossible. A leak in C or C++ is created when you allocate memory usingmalloc ornew, and never deallocate it. In Python or Java, a memory leak is created when you accidentally leave a reference to some massive data structure hanging around, so the entire thing cannot be freed. The solution is to find this reference and get rid of it.

I began looking for my memory leak by creating a minimal test that ran a big search. Next, I added "del reference" statements one variable at a time, trying to find which one was holding on to memory. I deleted all my variables and still my program's size did not change. Next, I added calls to force Python'scyclic garbage collector to run, in case I had circular references that were keeping the objects from being deleted. Python still consumed a gigabyte of memory. Finally, I turned to Google to see if anyone else has had a similar problem. I turned upa mailing list thread about Python never returning memory to the operating system. It turns out that this is a flaw with the Python interpreter. People work around it by spawning multiple processes or using data structures on disk. Then I was curious, why would Python choose to never free memory? Many hours later, I now know far more than I ever wanted to about how Python deals with memory.

Python does a lot of allocations and deallocations.All objects, including "simple" types like integers and floats, are stored on the heap. Callingmalloc andfree for each variable would be very slow. Hence, the Python interpreter uses a variety of optimized memory allocation schemes. The most important one is amalloc implementation calledpymalloc, designed specifically to handle large numbers of small allocations. Any object that is smaller than 256 bytes uses this allocator, while anything larger uses the system'smalloc. This implementation never returns memory to the operating system. Instead, it holds on to it in case it is needed again. This is efficient when it is used again in a short time, but is wasteful if a long time passes before it is needed.

So how can Python's memory usage sometimes decrease if it never releases small objects? Common Python objects, such as integers, floats, and lists, maintain their own, private memory pools which are not shared with any other type. Integers and floats use a policy similar to pymalloc: They never get released. Lists, on the other hand, are freed. Python only keeps a small, fixed number of unused list objects (currently 80), which limits the wasted memory. Most importantly, the actual array stored in the list is freed immediately, which is a significant amount of memory for big lists.

There are some serious inefficiencies in the interpreter's memory allocation policies. For example, applications like mine, always hold on to the peak amount of memory, even though the average memory consumed is much lower. This can be an issue for long running servers, such as Twisted, Zope orPlone servers, or applications where memory is critical. For example,Plone recommends using a caching server with lots of RAM, so having Python release memory that is not currently being used would be helpful in that configuration.Another problem is that the private memory pools are not shared between types. For example, if you allocate then free a large number of integers, the cached available memory cannot be used to allocate floats.I amattempting to rewrite Python's memory allocator, to try and make Python use memory more efficiently. I'll keep you posted on how it goes.

原文地址 http://www.evanjones.ca/python-memory.html

0 0