Descriptors in Python

来源:互联网 发布:ubuntu vim安装 编辑:程序博客网 时间:2024/06/05 10:53

  Reprint:http://www.informit.com/articles/printerfriendly.aspx?p=1309289


Python descriptors have been around a long time—they were introduced way back in Python 2.2. But they're still not widely understood or used. This article shows how to create descriptors and presents three examples of use. All three examples run under Python 3.0, although the first two can be back-ported to any Python version from 2.2 onward simply by changing each class definition to inherit object, and by replacing uses of str.format() with the % string formatting operator. The third example is more advanced, combining descriptors with class decorators—the latter introduced with Python 2.6 and 3.0—to produce a uniquely powerful effect.

What Are Descriptors?

A descriptor is a class that implements one or more of the special methods, __get__(), __set__(), and __delete__(). Descriptor instances are used to represent the attributes of other classes. For example, if we had a descriptor class called MyDescriptor, we might define a class that used it like this:

class MyClass:    a = MyDescriptor("a")    b = MyDescriptor("b")

(In Python 2.x versions, we would write class MyClass(object): to make the class a new-style class.) The MyClass class now has two instance variables, accessible as self.a and self.b in MyClass objects. How these instance variables behave depends entirely on the implementation of the MyDescriptor class—and this is what makes descriptors so versatile and powerful. In fact, Python itself uses descriptors to implement properties and static methods.

Now we'll look at three examples that use descriptors for three completely different purposes so that you can start to see what can be achieved with descriptors. The first two examples show read-only attributes; the third example shows editable attributes. None of the examples covers deletable attributes (using __delete__()), since use cases are rather rare.

Using Descriptors to Compute Attributes

We often create classes holding data items that we want to access in more than one form. For example, imagine that we have a Person class that holds a person's salutation, forename, and surname. Typically we want to display the complete name in some standard form; for example, the salutation, the first initial of the forename, a period, and then the surname. We could easily do this by creating an additional "display name" instance variable whenever a new Person object was created. But this approach has some disadvantages:

  • We must store an extra string for every person, even if we rarely use it.
  • If the salutation, forename, or surname is changed, we must update the display name.

It would be much better if we could just create the display name when it was needed—this technique would avoid the need to store or update an extra string for each person. Here's how the definition of such a Person class might look:

class Person:    display_name = DisplayName()    def __init__(self, salutation, forename, surname):        self.salutation = salutation        self.forename = forename        self.surname = surname

Every instance of the Person class has four instance attributes:

  • self.salutation
  • self.forename
  • self.surname
  • self.display_name

The self.display_name attribute is represented by an instance of the DisplayName descriptor class (so it doesn't use any memory for each Person instance), yet it's accessed like any other attribute. (In this case, we've made it read-only; as you'll see shortly, the descriptor has a getter but no setter.)

For example:

fred = Person("", "Fred", "Bloggs")assert fred.display_name == "F. Bloggs"jane = Person("Ms", "Jane", "Doe")assert jane.display_name == "Ms J. Doe"

The implementation of the descriptor class is very simple:

class DisplayName:    def __get__(self, instance, owner=None):        parts = []        if instance.salutation:            parts.append(instance.salutation)        if instance.forename:            parts.append(instance.forename[0] + ".")        parts.append(instance.surname)        return " ".join(parts)

When an attribute is read via a descriptor, the descriptor's __get__() method is called. The self argument is the descriptor instance, the instance argument is the instance of the object for whose class the descriptor is defined, and the owner argument is that class. So, when fred.display_name is used, the Person class' instance of the DisplayName descriptor's __get__() method is called with Person.displayName as the self argument, fred as the instance argument, and Person as the owner argument.

Of course, the same goal could be achieved by using a display name property. By using a descriptor, however, we can create as many display name attributes as we like, in as many different classes as we like—all getting their behavior from the descriptor with a single line of code for each one. This design eases maintenance; if we need to change how the display name attribute works (perhaps to change the format of the string it returns), we have to change the code in only one place—in the descriptor—rather than in individual property functions for each relevant attribute in every affected class.

Clearly, descriptors can be useful for reducing the memory footprint of classes in which attributes can be computed. This footprint can be reduced still further by using slots; for example, by adding this line to the Person class:

__slots__ = ("salutation", "forename", "surname")

If the computation is expensive, we could cache the results in the descriptor, using the technique shown in the next section.

Using Descriptors to Store Data

Let's look at a use of descriptors that in some ways is the complete opposite of what we've just seen. In some situations, we may prefer to store all or some of a class' data outside the class, while at the same time being able to access the data through instance attributes in the normal way.

For example, suppose we need to store large numbers of Book objects, each holding the details of a particular book. Imagine further that for some of the books we need to output the book's details as a bibliographic entry in the DocBook XML format, and that when such output is required once, it's very likely to be required again. One way of handling this situation is to use a descriptor to generate the XML—and to cache what it generates.

Here's a class that uses such a descriptor:

class Book:    biblioentry = BiblioEntry()    def __init__(self, isbn, title, forename, surname, year):        self.isbn = isbn        self.title = title        self.forename = forename        self.surname = surname        self.year = year

No biblioentry data is held in Book instances. When the data is requested (for example, book.biblioentry) for the first time, the entry is computed; on the second and subsequent requests, the computed entry is returned immediately. Here's the descriptor's definition:

class BiblioEntry:    def __init__(self):        self.cache = {}    def __get__(self, instance, owner=None):        entry = self.cache.get(id(instance), None)        if entry is not None:            return entry        entry = """<biblioentry><abbrev>{surname}{yr:02d}</abbrev><authorgroup><author><firstname>{forename}</firstname><surname>{surname}</surname></author></authorgroup><copyright><year>{year}</year></copyright><isbn>{isbn}</isbn><title>{title}</title></biblioentry>\n""".format(        yr=(instance.year - 2000 if instance.year >= 2000                                else instance.year - 1900),        forename=xml.sax.saxutils.escape(instance.forename),        surname=xml.sax.saxutils.escape(instance.surname),        title=xml.sax.saxutils.escape(instance.title),        isbn=instance.isbn, year=instance.year)        self.cache[id(instance)] = entry        return entry

Structurally, the code is quite similar to the previous example, but here we create a cache whose keys are unique instance IDs and whose values are XML biblioentry strings, suitably escaped. By using the cache we ensure that the expensive computation is done only once for each book for which it's needed. We chose to store IDs rather than the instances themselves, to avoid forcing the Book instances to be hashable.

By caching, we're trading memory for speed; whether that's the right tradeoff can only be determined on a case-by-case basis. Another issue to note: When using caching, if the data changes, some or all of the cache's contents become invalid. Since the details of published books don't change, it isn't a problem in this example, but if changes were common we must cope with them. One approach is to use a "dirty" flag and ignore the cache if instance.dirty is True; another approach is to access the cache itself and clear it.

You can provide access to an attribute's underlying descriptor by adding two lines at the beginning of the descriptor's __get__() method:

def __get__(self, instance, owner=None):    if instance is None:        return self    # ...

If we had the above lines at the start of the BiblioEntry's __get__() method, we could clear the cache like this:

Book.biblioentry.cache.clear()

Now that we've seen how we can provide both computed and stored attributes using descriptors, let's look at a third use of descriptors: validation.

Combining Descriptors with Class Decorators for Validation

Let's examine how to create attributes that are readable and writable and that are validated whenever they're set. We'll start by seeing an example of use, and then consider how to make the use possible:

@ValidString("name", empty_allowed=False)@ValidNumber("price", minimum=0, maximum=1e6)@ValidNumber("quantity", minimum=1, maximum=1000)class StockItem:    def __init__(self, name, price, quantity):        self.name = name        self.price = price        self.quantity = quantity

The StockItem class in this example has three attributes: self.name, self.price, and self.quantity. The first attribute must be a string and cannot be set to be empty. The second and third attributes must be numbers and can only be set to values in the ranges specified. For example:

cameras = StockItem("Camera", 45.99, 2)cameras.quantity += 1 # works fine, quantity is now 3cameras.quantity = -2 # raises ValueError("quantity -2 is too small")

The validation is achieved by combining class decorators with descriptors.

A class decorator takes a class definition as its sole argument and returns a new class with the same name as the one it was passed. This feature allows us to take a class, process it in some way, and produce a modified version of the class. And just as with function and method decorators, we can apply as many class decorators as we like, each one modifying the class further to produce the class we want.

In the code shown above, it looks like we've used a class decorator that takes multiple arguments, but that's not the case. The ValidString() and ValidNumber() functions take various arguments, and both return a class decorator; the decorator they return is used to decorate the class. Let's look at the ValidString() function, since it's the shorter and simpler of the two:

def ValidString(attr_name, empty_allowed=True):    def decorator(cls):        name = "__" + attr_name        def getter(self):            return getattr(self, name)        def setter(self, value):            assert isinstance(value, str), (attr_name +                                            " must be a string")            if not empty_allowed and not value:                raise ValueError(attr_name +                                " may not be empty")            setattr(self, name, value)        setattr(cls, attr_name, GenericDescriptor(getter, setter))        return cls    return decorator

The function takes two arguments—the name of the attribute to validate, and one validation criterion (in this case, whether the attribute can be empty). Inside the function we create a decorator function. The decorator takes a class as argument and will create a private data attribute based on the attribute name. For example, the "name" attribute used in the example will have its data held in self.__name.

Next, a getter function is created that uses Python's getattr() function to return the attribute with the given name. Then a setter function is created, and here the validation code is written and the setattr() function is used to set the new value. After defining the getter and setter, setattr() is called on the class to create a new attribute with the given name (for instance, self.name), and this attribute's value is set to be a descriptor of type GenericDescriptor. Finally, the decorator function returns the modified class, and the ValidString() function returns the decorator function.

The class decorated with one or more uses of ValidString() will have two new attributes added for each use. For example, if the name given is "name", the attributes will be self.__name (which will hold the actual data) and self.name (a descriptor through which the data can be accessed).

Now that we've seen how the decorator is created, we're ready to see the—remarkably simple—descriptor:

class GenericDescriptor:    def __init__(self, getter, setter):        self.getter = getter        self.setter = setter    def __get__(self, instance, owner=None):        if instance is None:            return self        return self.getter(instance)    def __set__(self, instance, value):        return self.setter(instance, value)

The GenericDescriptor takes a getter and a setter function and uses them to get and set the attribute data in the given instance. This means that the data is held in the instance, not in the descriptor—the descriptor purely provides the means of accessing the data by using its getter and setter functions.

The ValidNumber() function is almost identical to the ValidString() function, the only differences being the arguments it takes (specifying the minimum and maximum acceptable values), and the setter it creates. Here's an extract that just shows the setter:

def setter(self, value):    assert isinstance(value, numbers.Number), (            attr_name + " must be a number")    if minimum is not None and value < minimum:            raise ValueError("{0} {1} is too small".format(                                attr_name, value))    if maximum is not None and value > maximum:            raise ValueError("{0} {1} is too big".format(                                attr_name, value))    setattr(self, name, value)

The numbers.Number abstract base class is used to identify any kind of number.

The decorator/descriptor pattern shown here can be used to create validation functions for any type of data.

Conclusion

Descriptors are a powerful and useful Python feature, especially when combined with class decorators. Although decorators are declared as class attributes, they're accessed as instance attributes; if we want to associate data with a descriptor, we must either compute that data when it's requested (as in the first example), or provide storage for it (for example, using a cache, as in the second example). If we want to store descriptor-related data in instances, we can do so by using a class decorator to create a data attribute to hold the data and a descriptor attribute to provide mediated access to the data, as the third example showed.

Like most of Python's advanced features, descriptors and class decorators are not the first tools to try when a problem needs to be solved. But they should always be kept in mind because, in the right circumstances, they can provide clean and elegant solutions that aren't easily achieved by other means.