CLR via C# 总结之Chap5 Primitive, Reference, and Value Types

来源:互联网 发布:常见国内域名购买商 编辑:程序博客网 时间:2024/05/01 09:35

Primitive Types

Any data types the compiler directly supports are called primitive types. Primitive types map directly to types existing in the Framwork Class Library(FCL).

图1

  1. The primitive type names are short and thus giving some terms of convinience. However, the author doesn’t support the use of it and here are his reasons:

    • In common sense of many programmers, int represents a 32-bit integer on a 32-bit OS and a 64-bit integer on a 64-bit OS. But this is NOT true for C#. In C#, int always maps to System.Int32. Using Int32 will remove potential confusion.
    • In C#, long maps to System.Int64, while in a different language, it maps to an Int16 or Int32. Someone reading source code in one language could easily misinterpret the code’s intention if he is used to programming in a different language. This is also true for other language, not just C#.
    • The FCL has many methods that have type names as part of their method names. So you may feel unnatural to use float val = br.ReadSingle(); although it’s right. (In my view, if you feel OK, then it doesn’t matter a lot)

    So, it does have some effect on the developing of coding, we can try to use its real name then.

  2. 2.
Int32 i = 5;Int64 j = i; // Implicit cast does happen

Based on the casting rules, this code should NOT be able to compile because nither one of this two type derives from the other. However, it does compile because C# compiler has intimate knowledge of primitive types and applies special rules when compiling the code —— It produces necessary IL to make things happen as expected when it recognize common programming patterns like this.

  1. C# allows implicit casts if the conversion is “safe”, which means no loss of data is possible. If the conversion is potentially unsafe, it requires explicit cast. Keep in mind that different compilers can apply different casting rules so you need to be careful.

  2. Primitive types can be written as literals. A literal is considered to be an instance of the type itself, so it can call the methods of its corresponding type directly.

Checked and unchecked primitive type operations

  1. Byte b = 100;b = (Byte) (b + 200); // Implicit cast in the right parenthesis because no data loss, b now contains 44

    The first step requires b be expanded to 32-bit value(or 64-bit value if any operand requires more than 32 bits).

    In most programming scenarios, silent overflow is undesirable. However, in some rare programming scenarios, such as calculating a hash value or a checksum, this overflow is not only acceptable but is also desired.

    By default, overflow is NOT checked in C#. But if you use checking version, the IL instructions like add will be replaced by their checking versions, like add.ovf. You make the choice.

    The way to check overflow:

    1. use the /checked+ compiler switch to turn on/off gloabally
    2. use code like : checked(...); unchecked(...); checked{...}
    Byte b = 100;b = checked((Byte) (b + 200)); // OverflowException is thrownb = (Byte) checked(b + 200); // No OverflowExceptionchecked{    Byte c = 100;    c += 200; // Can use += if we use checked block}

    Attention: Calling a method within a checked operator or statement has no impact on that method

    checked{    SomeMethod(300);    // Assume SomeMethod tries to load 300 into a Byte    // It will NOT throw an OverflowException if it is not compiled with checked instructions}
  2. Programming Recommendation

    • Use signed data types wherever possible. This allows the compiler to detect more overflow/underflow errors. Also, some parts of the class library(such as Length properties of Array and String are hard-coded to return signed values.
    • Explicitly use checked around blocks where an unwanted overflow might occur due to invalid input data. Also, OverflowException can be caught as well.
    • Explicitly use unchecked around blocks where overflow is OK.

Reference Types and Value Types

  1. Value type instances don’t use GC(garbage collector), so their use reduces pressure in the managed heap and reduces the number of collections and application requires over its lifetime.
  2. In .NET Framework SDK ducumentation, any type called a class is a reference type, while each value type is referred to as a structure or an enumeration. All of the structures are immediately derived from System.ValueType abstract type, which is itself immediately derived from System.Object type.

    struct SomeVal{public Int32 x;}static void Demo(){   SomeVal v1 = new SomeVal(); // Allocated on the thread's stack!!! new doesn't mean heap as it does in C++}

    In C#, types declared using struct are value types, and types declared using class are reference types.

    SomeVal v1 = new SomeVal(); // line 1Int32 a = v1.x;SomeVal v2; // line 3Int32 b = v2.x; // error CS0170: Use of possibly unassigned field 'x'

    In fact, both line 1 and 3 produce IL that allocates the instance on the thread’s stack and zeroes the fields. The only difference is that C# “thinks” that the instance is initialized if you use the new operator. ????

  3. In some situations, value types can give better performance. Declare a type as a value type if ALL the following are true:

    • The type acts as a primitive type. When a type offers no members that alter its fields, we say that the type is immutable. In fact, it is recommended that many value types mark all their fields as readonly(discussed later in Chap7).
    • The type doesn’t need to inherit from any other type!!!
    • The type won’t have any other types derived from it!!!

    What’s more, when a value type is used as argument or returned value, all the fields in the value type instance are copied, hurting performance. So the following statements also need to be true:

    • Instances of the type are small(about 16 bytes or less).
    • Instance of the type are large(>16 bytes) and are NOT passed as method parameters or returned from methods.
  4. The main advantage of value types is that they’re NOT allocated as objects in the managed heap. Here are some of the ways in which they differ:
    • Value type objects have two representations: an unboxed form and a boxed form, while reference types are always in a boxed form.
    • System.ValueType overrides the Equals and GetHashCode methods in System.Object, and due to perfromance issues with this default implementation, these two method should be overrided in your own value type. But How?????????????
    • Any new vitual methods should NOT be introduced into a value type. No methods can be abstract and all methods are implicitly sealed(can’t be overridden). You CAN’T define a new value type or a new reference type by using a value type as a base type!!!
    • Reference types initialized to null, while a special feature, called nullable types, is offered to add the notion of nullability to a value type.
    • When assigned, field-by-field copy is made for value type while only the memory address is copied for reference type.

How the CLR Controls the Layout of a Type’s Fields

  1. To improve performance, the CLR is capable of arranging the fields of a type any way it chooses(Why and How this affect performance in detail???). e.g. Fields in memory might be reordered by the CLR so that object references are grouped together and data fields are properly aligned and packed.
  2. Tell the CLR what to do by applying the System.Runtime.InteropServices.StructLayoutAttribute attribute on the class or structure you’re defining. Pass LayoutKind.Auto to have the CLR arrange the fields, LayoutKind.Sequential to have the CLR preserve your field layout, or LayoutKind.Explicit to explicitly arrange the fields in memory by using offsets.
  3. MS’s C# compiler selects LayoutKind.Auto for reference types(classes) and LayoutKind.Sequential for value types.
  4. Apply an instance of the System.Runtime.InteropServices.FieldOffsetAttribute attribute to each field passing to this attribute’s constructor an Int32 indicating the offset (in bytes) of the field’s first byte from the beginning of the instance. Explicit layout is typically used to simulate what would be a union in unmanaged C/C++ because you can have multiple fields starting at the same offset in memory.
// Let the CLR arrange the fields to improve// performance for this value type.[StructLayout(LayoutKind.Auto)]internal struct SomeValType{    private readonly Byte m_b;}// The developer explicitly arranges the fields of this value type.[StructLayout(LayoutKind.Explicit)]internal struct SomeValType {[FieldOffset(0)]private readonly Byte m_b; // The m_b and m_x fields overlap each[FieldOffset(0)]private readonly Int16 m_x; // other in instances of this type}
  1. It should be noted that it is illegal to define a type in which a reference type and a value
    type overlap.

Boxing and Unboxing Value Types

  1. In many cases, you must get a reference to an instance of a value type. e.g. public virtual Int32 Add(Object value); for ArrayList. This indicates that Add requires a reference(or pointer) to an object on the managed heap as a parameter. So a value type must be converted into a true heap-managed object, and a reference to this object must be obtained (注: 虽然说ValueType也是继承自Object,但是这个歌参数不是ValueType就已经说明这个函数想要的并不是值类型了,它内部也并不会有ValueType所特有的方法什么的)。
  2. What happens when an instance of a value type is boxed:
    1. Memory is allocated from the managed heap. The amount of memory allocated is the size required by the value type’s fields plus the two additional overhead members (the type object pointer and the sync block index) required by all objects on the managed heap.
    2. The value type’s fields are copied to the newly allocated heap memory.
    3. The address of the object is returned.

3.
“`
// Declare a value type.
struct Point {
public Int32 x, y;
}

public sealed class Program {    public static void Main() {        ArrayList a = new ArrayList();        Point p; // Allocate a Point (not in the heap).        for (Int32 i = 0; i < 10; i++) {            p.x = p.y = i; // Initialize the members in the value type.            a.Add(p); // Box the value type and add the                      // reference to the Arraylist.        }        ...    }}```The Point value type variable (p) can be reused because the `ArrayList` **never knows anything about it**. **Note that** the lifetime of the boxed value type **extends beyond** the lifetime of the unboxed value type.

4.
Noted that the FCL now includes a new set of generic collection classes that make the non-generic collection classes obsolete(过时/废弃的). e.g. Use System.Collections.Generic.List<T> class instead of the System.Collections.ArrayList class.

The generic collection classes offer many **improvements** over the non-generic equivalents. For example, the API has been **cleaned up and improved**, and the performance

of the collection classes has been greatly improved as well. But one of the biggest improvements
is that the generic collection classes allow you to **work with collections of value types
without requiring that items in the collection be boxed/unboxed**. This in itself greatly improves
performance because far fewer objects will be created on the managed heap, thereby
reducing the number of garbage collections required by your application. Furthermore,
you will get compile-time type safety, and your source code will be cleaner due to fewer casts.

5.
`Point p = (Point) a[0]; // Unboxing and copy

Here you’re taking the reference (or pointer) contained in element `0` of the `ArrayList` and trying

to put it into a Point value type instance, p.

For this to work, all of the fields contained in the boxed

Point object must be copied into the value type variable, p, which is on the thread’s stack. The CLR
accomplishes this copying in two steps:
1. Unboxing : The address of the Point fields in the boxed Point object
is obtained.
2. The values of these fields are copied from the heap to the stack-based value type instance

**Mind** that unboxing doesn't involve the copying of any bytes in memory, and note that an unboxing operationis typically followed by copying the fields.
  1. While unboxing, a NullReferenceException is thrown if the variable containing the reference to the boxed value type instance is null; and if the reference doesn’t refer to an object that is a boxed instance of the desired value type,
    an InvalidCastException is thrown.

    public static void Main() {Int32 x = 5;Object o = x; // Box x; o refers to the boxed objectInt16 y = (Int16) o; // Throws an InvalidCastExceptionInt16 z = (Int16)(Int32) o; // Unbox to the correct type and cast}

7.
“`
public static void Main() {
Point p;
p.x = p.y = 1;
Object o = p; // Boxes p; o refers to the boxed instance

    // Change Point's x field to 2    p = (Point) o; // Unboxes o AND copies fields from boxed                    // instance to stack variable    p.x = 2; // Changes the state of the stack variable    o = p; // Boxes p; o refers to a new boxed instance}```The code at the bottom of this fragment is intended **only** to change `Point`’s x field from `1` to `2`. This involves unboxing/copy and boxing again. A whole new boxed instance in the managed heap is created.
  1. Some languages, such as C++/CLI, allow you to unbox a boxed value type **without copying the
    fields**. In this case, the unboxed instance’s fields happen to be in a boxed object on the heap, and we can manipulate the data directly. This improves performance by avoid both allocating and copying twice!

  2. Be aware of the boxing related operations mentioned above if you are the least bit concerned about your application’s performance. Use ILDasm.exe or .Net Reflector to view the IL code and see where the box IL instructions are.

  3. The help of IL code:

    // Consider this codepublic static void Main() {Int32 v = 5; // Create an unboxed value type variable.Object o = v; // o refers to a boxed Int32 containing 5.v = 123; // Changes the unboxed value to 123Console.WriteLine(v + ", " + (Int32) o); // Displays "123, 5"}

    The boxing operations occur THREE times!!! When we are very concerned about the performance of some code, we should be attention to the boxing operations. (In this demo, this is because the value type instances must be converted to Object to fit String.Concat(Object, Object, Object) first)

  4. Basically, if you want a reference to an instance of a value type, the instance must be boxed.

  5. You can still call virtual methods(such as Equals, GetHashCode, or ToString) inherited or overridden by the type. If your value type overrides one of these virtual methods, then the CLR can invoke the method nonvirtually because value types are implicitly sealed and cannot have any types derived from them. In addition, the
    value type instance being used to invoke the virtual method is NOT boxed. However, if your override
    of the virtual method calls into the base type’s implementation of the method, then the value type
    instance does get boxed when calling the base type’s implementation so that a reference to a heap
    object gets passed to the this pointer into the base method.

  6. In addition, casting an unboxed instance of a value type to one of the type’s interfaces requires the
    instance to be boxed, because interface variables must always contain a reference to an object on the
    heap. (Covered later in Chap 13)

14.

using System;internal struct Point : IComparable {    private readonly Int32 m_x, m_y;    // Constructor to easily initialize the fields    public Point(Int32 x, Int32 y) {        m_x = x;        m_y = y;    }    // Override ToString method inherited from System.ValueType    public override String ToString() {    // Return the point as a string. Note: calling ToString prevents boxing    return String.Format("({0}, {1})", m_x.ToString(), m_y.ToString());    }    // Implementation of type-safe CompareTo method    public Int32 CompareTo(Point other) {        return Math.Sign(Math.Sqrt(m_x * m_x + m_y * m_y)            - Math.Sqrt(other.m_x * other.m_x + other.m_y * other.m_y));    }    // Implementation of IComparable's CompareTo method    public Int32 CompareTo(Object o) {        if (GetType() != o.GetType()) {            throw new ArgumentException("o is not a Point");        }        // Call type-safe CompareTo method        return CompareTo((Point) o);    }}public static class Program {    public static void Main() {        // Create two Point instances on the stack.        Point p1 = new Point(10, 10);        Point p2 = new Point(20, 20);        // p1 does NOT get boxed to call ToString (a virtual method).        Console.WriteLine(p1.ToString());// "(10, 10)"        // Name hiding or override? Same effect? Only differ when Polymorphism happens???        // p DOES get boxed to call GetType (a non-virtual method).        Console.WriteLine(p1.GetType());// "Point"        // What if Point has a method named GetType hiding the non-virtual method in base class        // p1 does NOT get boxed to call CompareTo.        // p2 does NOT get boxed because CompareTo(Point) is called.        Console.WriteLine(p1.CompareTo(p2));// "-1"        // p1 DOES get boxed, and the reference is placed in c.        IComparable c = p1;        Console.WriteLine(c.GetType());// "Point"        // p1 does NOT get boxed to call CompareTo.        // Because CompareTo is not being passed a Point variable,        // CompareTo(Object) is called, which requires a reference to        // a boxed Point.        // c does NOT get boxed because it already refers to a boxed Point.        Console.WriteLine(p1.CompareTo(c));// "0"        // c does NOT get boxed because it already refers to a boxed Point.        // p2 does get boxed because CompareTo(Object) is called.        Console.WriteLine(c.CompareTo(p2));// "-1"        // c is unboxed, and fields are copied into p2.        p2 = (Point) c;        // Proves that the fields got copied into p2.        Console.WriteLine(p2.ToString());// "(10, 10)"    }}

Normally, to call a virtual method, the CLR needs to determine the objects’s type in order to locate the type’s method table. However, for value type, JIT compiler sees its overrided ToString method so it call the method directly. The compiler knows that polymorphism can’t come into play. Note that if Point’s ToString method internally calls
base.ToString(), then the value type instance would be boxed when calling System.ValueType’s
ToString method.

In the call to the nonvirtual GetType method, p1 does have to be boxed.
The reason is that the Point type inherits GetType from System.Object. So to callGetType,
the CLR must use a pointer to a type object, which can be obtained only by boxing p1.

Casting to IComparable: When casting p1 to a variable (c) that is of an interface type, p1
must be boxed because interfaces are reference types by definition.

15.

I realize that all of this information about reference types, value types, and boxing might be overwhelming
at first. However, a solid understanding of these concepts is critical to any .NET Framework
developer’s long-term success. Trust me: having a solid grasp of these concepts will allow you to build
efficient applications faster and easier.

Changing Fields in a Boxed Value Type by Using Interfaces (and Why You Shouldn’t Do This)

  1. Some languages, such as C++/CLI, let you change the fields in a boxed value type, but C# does not.
    However, you can fool C# into allowing this by using an interface. The following code is a modified
    version of the previous code.
using System;// Interface defining a Change methodinternal interface IChangeBoxedPoint {    void Change(Int32 x, Int32 y);}// Point is a value type.internal struct Point : IChangeBoxedPoint {    private Int32 m_x, m_y;    public Point(Int32 x, Int32 y) {        m_x = x;        m_y = y;    }    public void Change(Int32 x, Int32 y) {        m_x = x; m_y = y;    }    public override String ToString() {        return String.Format("({0}, {1})", m_x.ToString(), m_y.ToString());    }}public sealed class Program {    public static void Main() {        Point p = new Point(1, 1);         Console.WriteLine(p); // 1 1        p.Change(2, 2);        Console.WriteLine(p); // 2 2        Object o = p;        Console.WriteLine(o); // 2 2        ((Point) o).Change(3, 3);        Console.WriteLine(o); // 2 2        // Boxes p, changes the boxed object and discards it        ((IChangeBoxedPoint) p).Change(4, 4);        Console.WriteLine(p);  2 2        // Changes the boxed object and shows it        ((IChangeBoxedPoint) o).Change(5, 5); // This points to the address in managed heap!!!        Console.WriteLine(o); // 5 5    }}

In the last example, the boxed Point referred to by o is cast to an IChangeBoxedPoint. No boxing
is necessary here because o is already a boxed Point. Then Change is called, which does change
the boxed Point’s m_x and m_y fields.

Object Equality and Identity

  1. This is about how to define a type
    that properly implements object equality. Mind that equality is about value/content, and identity is about if they are the same one.

  2. The implementation of Object’s Equals method looks
    like this.

    public class Object {   public virtual Boolean Equals(Object obj) {     // If both references point to the same object,        // they must have the same value.        if (this == obj) return true;        // Assume that the objects do not have the same value.        return false;    }}

    However, the default implementation of Object’s Equals method really **implements identity, not
    value equality**.

  3. Here is how to properly
    implement an Equals method internally:

    1. If the obj argument is null, return false;
    2. If the this and obj arguments refer to the same object, return true. Identity must lead to equality. This can improve perfromance when comparing objects with many fields.
    3. If this and obj arguments refer to objects of different types, return false.
    4. For each instance field defined by the type, compare the value in the this object with the
      value in the obj object. If any fields are not equal, return false.
    5. Call the base class’s Equals method so it can compare any fields defined by it. However, if the base class is Object, then don’t call Equals because it tests identity, and this comparing will fail even if contents are the same but not thesame object.
  4. If you override Object’s Equals method, this Equals method can no longer be called to test for identity. To fix this, Object offers a static ReferenceEquals
    method, which is implemented like this.

    public class Object {    public static Boolean ReferenceEquals(Object objA,    Object objB) {       return (objA == objB);   }}

    You should always call ReferenceEquals if you want to check for identity. You shouldn’t use the C# == operator (unless you cast both operands to Object
    first) because one of the operands’ types could overload the == operator, giving it semantics other
    than identity.

  5. BTW, System.ValueType (the base class of all value types) does override
    Object’s Equals method and is correctly implemented to perform a value equality check (not an
    identity check).

  6. Internally, ValueType’s Equals method uses reflection (covered in Chapter 23, “Assembly Loading
    and Reflection”) to comparing each filed by calling the Equals method of them. Because the CLR’s reflection mechanism is slow, when defining
    your own value type, you should override Equals and provide your own implementation to improve
    the performance of value equality comparisons that use instances of your type. Of course, in your
    own implementation, do not call base.Equals.

  7. Properties of equality: reflexive, symmetric, transitive, consistent.

  8. A couple more things to do:

    • Have the type implement the System.IEquatable interface’s Equals method. This generic interface allows you to define a type-safe Equals method. Usually, you’ll implement
      the Equals method that takes an Object parameter to internally call the type-safe Equals
      method.
    • Overload the == and != operator methods. Usually, you’ll implement these operator methods
      to internally call the type-safeEquals method.

Object Hash Codes

  1. System.Object provides a virtual GetHashCode method so that an Int32 hash code cna be obtained for any and all objects, also provide customization.
  2. Note that if you define a type and override the Equals method, you should also override the GetHashCode method. Because some collections in CLR require thatany two objects that are equal must have the same hash code value.
  3. When we add a key/value pair to a collection, a hash code for the key object is obtained first, and used to decide which “bucket” to put the value. So we cannot change the ke directly, otherwise we cannot find the real value object in the collection any more, also, the object cannot be deleted, leading to memory leak. We should fetch the pair, remove the original pair, modify the key and add the new pair back into the hash table.
  4. When selecting an algorithm for calculatin hash codes for instances of our type, try to follow these guidelines:
    • Good Random distribution gives the best performance of the hash table.
    • You can call the base type’s GetHashCode method in your algorithm., but generally not that of Object or ValueType, because either method doesn’t lend itself to hiah-performance.
    • Use at least one instance field. Ideally, the fields used in your algorithml should be immutable.
    • Exevute as quickly as possible.
    • Objects with the same value should return the same code.
  5. Important: Never, ever persist hash code values, because hash code values are subject to change.

The dynamic Primitive Type

总结得好烦,先占坑,有心情了再补哈哈哈

原创粉丝点击