C# Heap(ing) Vs Stack(ing) in .NET: Part IV

来源:互联网 发布:linux内核移植教程 编辑:程序博客网 时间:2024/05/01 17:28

Even though with the .NET framework we don't have to actively worry about memory management and garbage collection (GC), we still have to keep memory management and GC in mind in order to optimize the performance of our applications. Also, having a basic understanding of how memory management works will help explain the behavior of the variables we work with in every program we write.  In this article we'll look into Garbage Collection (GC) and some ways to keep our applications running efficiently by using static class members.

Smaller Feet == More Efficient Allocation.

To get a better understanding of why a smaller footprint will be more efficient will require us to delve a bit deeper into the anatomy of .NET memory allocation and Garbage Collection (GC).

Graphing

Let's look at this from the GC's point of view. If we are responsible for "taking out the trash" we need a plan to do this effectively. Obviously, we need to determine what is garbage and what is not (this might be a bit painful for the pack-rats out there). 

In order to determine what needs to be kept, we'll first make the assumption that everything not being used is trash (those piles of old papers in the corner, the box of junk in the attic, everything in the closets, etc.)  Imagine we live with our two good friends: Joseph Ivan Thomas (JIT) and Cindy Lorraine Richmond (CLR). Joe and Cindy keep track of what they are using and give us a list of things they need to keep. We'll call the initial list our "root" list because we are using it as a starting point.  We'll be keeping a master list to graph where everything is in the house that we want to keep. Anything that is needed to make things on our list work will be added to the graph (if we're keeping the TV we don't throw out the remote control for the TV, so it will be added to the list. If we're keeping the computer the keyboard and monitor will be added to the "keep" list).

This is how the GC determines what to keep as well. It receives a list of "root" object references to keep from just-in-time (JIT) compiler and common language runtime (CLR) (Remember Joe and Claire?) and then recursively searches object references to build a graph of what should be kept. 

Roots consist of:

  • Global/Static pointers. One way to make sure our objects are not garbage collected by keeping a reference to them in a static variable.
  • Pointers on the stack. We don't want to throw away what our application's threads still need in order to execute.
  • CPU register pointers. Anything in the managed heap that is pointed to by a memory address in the CPU should be preserved (don't throw it out).

In the above diagram, objects 1, 3, and 5 in our managed heap are referenced from a root 1 and 5 are directly referenced and 3 is found during the recursive search.  If we go back to our analogy and object 1 is our television, object 3 could be our remote control. After all objects are graphed we are ready to move on to the next step, compacting.

Compacting

Now that we have graphed what objects we will keep, we can just move the "keeper objects" around to pack things up.

Fortunately, in our house we don't need to clean out the space before we put something else there. Since Object 2 is not needed, as the GC we'll move Object 3 down and fix the pointer in Object 1.

Next, as the GC, we'll copy Object 5 down

Now that everything is cleaned up we just need to write a sticky note and put it on the top of our compacted heap to let Claire know where to put new objects.

Knowing the nitty-gritty of CG helps in understanding that moving objects around can be very taxing. As you can see, it makes sense that if we can reduce the size of what we have to move, we'll improve the whole GC process because there will be less to copy.

What about things outside the managed heap?

As the person responsible for garbage collection, one problem we run into in cleaning house is how to handle objects in the car. When cleaning, we need to clean everything up. What if the laptop is in the house and the batteries are in the car?

There are situations where the GC needs to execute code to clean up non-managed resources such as files, database connections, network connections, etc. One possible way to handle this is through a finalizer.

class Sample

{

          ~Sample()

          {

                    // FINALIZER: CLEAN UP HERE

          }

}

 

Object 2 is treated in the usual fashion. However, when we get to object 4, the GC sees that it is on the finalization queue and instead of reclaiming the memory object 4 owns, object 4 is moved and it's finalizer is added to a special queue named freachable.
 
 

There is a dedicated thread for executing freachable queue items. Once the finalizer is executed by this thread on Object 4, it is removed from the freachable queue. Then and only then is Objet 4 ready for collection.

So Object 4 lives on until the next round of GC.

Because adding a finalizer to our classes creates additional work for GC it can be very expensive and adversely affect the performance garbage collection and thus our program. Only use finalizers when you are absolutely sure you need them.

A better practice is to be sure to clean up non-managed resources. As you can imagine, it is preferable to explicitly close connections and use the IDisposable interface for cleaning up instead of a finalizer where possible.

IDisposaible

Classes that implement IDisposable perform clean-up in the Dispose() method (which is the only signature of the interface). So if we have a ResouceUser class instead of using a finalizer as follows:

public class ResourceUser

{

          ~ResourceUser() // THIS IS A FINALIZER

          {

                    // DO CLEANUP HERE

          }

}

 

public class ResourceUser : IDisposable

{

          #region IDisposable Members

 

          public void Dispose()

          {

                    // CLEAN UP HERE!!!

          }

 

          #endregion

}

 

 

public static void DoSomething()

{

ResourceUser rec = new ResourceUser();

 

using (rec)

{

                // DO SOMETHING

 

} // DISPOSE CALLED HERE

 

            // DON'T ACCESS rec HERE

 

}

 

 

public static void DoSomething()

{

using (ResourceUser rec = new ResourceUser())

{

                // DO SOMETHING

 

} // DISPOSE CALLED HERE

}

 

Static Methods

Static methods belong to the type, not the instance of our object. This enables us to create items that are shared by all our classes and "trim the fat" so to speak. Only pointers to our static method have to be moved around in memory (8 bytes).  The static method itself will be loaded once, very early in the application lifecycle, instead of being contained in each instance of our class. Of course, the bigger the method the more efficiency we gain by making it static. If our methods are small (under 8 bytes) we will actually get worse performance out of making it static because the pointer would be larger than the method it points to.

Here's the details...

Let's say we have a class with a public method SayHello();

class Dude

{

          private string _Name = "Don";

 

          public void SayHello()

          {

                    Console.WriteLine(this._Name + " says Hello");

          }

} 

 

A (possibly) more efficient way is to make the method static so that we only have one "SayHello()" in memory no matter how many Dudes are around. Because static members are not instance members we can't use a reference to "this" and have to pass variables into the method to accomplish the same result. 

class Dude

{

          private string _Name = "Don";

 

          public static void SayHello(string pName)

          {

                    Console.WriteLine(pName + " says Hello");

          }

} 

 

Keep in mind what happens on the stack when we pass variables (see part II of this series). We have to decide on a case-by-case basis whether using a static method gives us improved performance. For instance, if a static method requires too many parameters and does not have very much internal logic (a small footprint), it is entirely possible we would loose more efficiency in calling a static method that we would gain.

Static Variables: Watch Out!

There are a couple of things we want to watch out for with static variables. If we have a class with a static method that we want to return a unique number, the following implementation will be buggy:

class Counter

{

          private static int s_Number = 0;

 

          public static int GetNextNumber()

          {

                    int newNumber = s_Number;

 

                    // DO SOME STUFF

           

                    s_Number = newNumber + 1;

 

                    return newNumber;

          }

}

 

We need to explicitly lock the read/write memory operations to static variables in the method so only one thread at a time can execute them. Thread management is a very large topic and there are many ways to approach thread synchronization. Using the lock() keyword is one way to ensure only one thread can access a block of code at a time. As a best practice, you should lock as little code as possible because threads have to wait in a queue to execute the code in the lock()  block and it can be inefficient.

class Counter

{

          private static int s_Number = 0;

 

          public static int GetNextNumber()

          {

                    lock (typeof(Counter))

                    {

                             int newNumber = s_Number;

 

                             // DO SOME STUFF

 

                             newNumber += 1;

                             s_Number = newNumber;

 

                             return newNumber;

                    }

          }

}

 

The next thing we have to watch out for objects referenced by static variables.  Remember, how anything that is referenced by a "root" is not cleaned up. Here's one of the ugliest examples I can come up with:

class Olympics

{

          public static Collection<Runner> TryoutRunners;

}

 

class Runner

{

          private string _fileName;

          private FileStream _fStream;

 

          public void GetStats()

          {

                    FileInfo fInfo = new FileInfo(_fileName);

                    _fStream = _fileName.OpenRead();

          }

}

 

Singleton

One trick to keep things light is to keep only one instance of a class in memory at all times. To do this we can use the GOF Singleton Pattern.

public class Earth

{

          private static Earth _instance = new Earth();

 

          private Earth() { }

 

          public static Earth GetInstance() { return _instance; }

}

 

The 2.0 Static Class

In the .NET 2.0 Framework we can have a static class which is a class in which all the members must be static. This is useful for utility classes and will definitely save us memory space because this class will only exist in one place in memory and it can not be instantiated no matter what.

In Conclusion...

So to wrap up, some things we can do to improve GC performance are:

  1. Clean up. Don't leave resources open!  Be sure to close all connections that are opened and clean up all non-managed objects as soon as possible. As a general rule when using non-managed objects, instantiate as late as possible and clean up as soon as possible.
  2. Don't overdo references.  Be reasonable when using references objects.  Remember, if our object is alive, all of it's referenced objects will not be collected (and so on, and so on). When we are done with something referenced by class, we can remove it by either setting the reference to null.  One trick I like to do is setting unused references to a custom light weight NullObject to avoid getting null reference exceptions. The fewer references laying about when the GC kicks off, the less pressure the mapping process will be. 
  3. Easy does it with finalizers. Finalizers are expensive during GC we should ONLY use them if we can justify it. If we can use IDisposible instead of a finalizer, it will be more efficient because our object can be cleaned up in one GC pass instead of two.
  4. Keep objects and their children together. It is easier on the GC to copy large chunks of memory together instead of having to essentially de-fragment the heap at each pass, so when we declare a object composed of many other objects, we should instantiate them as closely together as possible.
  5. And finally... keep objects lighter by making the methods static where appropriate.

Next time we'll look even more closely at the GC process and look into ways to check under the hood as your program executes to discover problems that may need to be cleaned up.

Until then,
-Happy coding

 
We have a private constructor so only Earth can execute it's constructor and make an Earth. We have a static instance of Earth and a static method to get the instance. This particular implementation is thread safe because the CLR ensures thread safe creation of static variables. This is the most elegant way I have found to implement the singleton pattern in C#.
Because the Runner Collection is static for the Olympics class, not only will objects in the collection will not be released for garbage collection (they are all indirectly referenced through a root), but as you probably noticed, every time we  run GetStats() the stream is opened to the file. Because it is not closed and never released by GC this code is effectively a disaster waiting to happen. Imagine we have 100,000 runners trying out for the Olympics.  We would end up with that many non-collectable objects each with an open resource.  Ouch! Talk about poor performance!
Static Variables: Watch Out... Number 2!
If two threads call GetNextNumber() at the same time and both are assigned the same value for newNumber before s_Number is incremented they will return the same result!
With every instance of our Dude class we take up space in memory for the SayHello() method.
By using using() with classes that implement IDisposible we can perform our cleanup without putting additional overhead on the GC by forcing it to finalize our objects.
I like putting the declaration for the object in the using block because it makes more sense visabally and rec is no longer available outside of the scope of the using block. Whis this pattern is more in line with the intention of the IDisposible interface, it is not required.
IDisposable in integrated with the using keyword. At the end of the using block Dispose() is called on the object declared in using(). The object should not be referenced after the using block because it should be essentially considered "gone" and ready to be cleaned up by the GC.
We can use IDisposable as a better way to implement the same functionality:
During object creation, all objects with a finalizer are added to a finalization queue. Let's say objects 1, 4, and 5 have finalizers and are on the finalization queue.  Let's look at what happens when objects 2 and 4 are no longer referenced by the application and ready for garbage collection.
 
 
原创粉丝点击