Building a higher-level query API: the right way to use Django's ORM

来源：互联网发布：catia v6软件下载编辑：程序博客网时间：2024/05/17 22:22

This blog post is based on a talk given at the Brighton Python User Group on April 10th, 2012.

Summary

In this article, I'm going to argue that using Django's low-level ORM query methods (filter, order_byetc) directly in a view is (usually) an anti-pattern. Instead, we should be building custom domain-specific query APIs at the level of the model layer, where our business logic belongs. Django doesn't make this particularly easy, but by taking a deep-dive into the internals of the ORM, I'll show you some neat ways to accomplish it.

业务层次的操作应该被抽象成对象的一个方法来使用，而不是直接使用诸如filter、order_by等底层ORM的api来实现。

Overview

When writing Django applications, we're accustomed to adding methods to our models to encapsulate business logic and hide implementation details. This approach feels completely natural and obvious, and indeed is used liberally throughout Django's built-in apps:

>>> from django.contrib.auth.models import User>>> user = User.objects.get(pk=5)>>> user.set_password('super-sekrit')>>> user.save()

Here set_password is a method defined on the django.contrib.auth.models.User model, which hides the implementation details of password hashing. The code looks something like this (edited for clarity):

from django.contrib.auth.hashers import make_passwordclass User(models.Model):    # fields go here..    def set_password(self, raw_password):        self.password = make_password(raw_password)

We're building a domain-specific API on top of the generic, low-level object-relational mapping tools that Django gives us. This is basic domain modelling: we're increasing the level of abstraction, making any code that interacts with this API less verbose. The result is more robust, reusable and (most importantly) readable code.

Django中的set_password就是封装业务层操作的很好的例子。将业务层次所涉及到的底层ORM的api进行封装可以使代码更加健壮、易于复用，更重要的是可读性强。

So, we already do this for individual model instances. Why not apply the same idea to the APIs you use to select collections of model instances from the database?

A toy problem: the Todo List

To illustrate the approach, we're going to use a simple todo list app. The usual caveats apply: this is a toy problem. It's hard to show a real-world, useful example without huge piles of code. Don't concentrate on the implementation of the todo list itself: instead, imagine how this approach would work in one of your own large-scale applications.

Here's our application's models.py:

from django.db import modelsPRIORITY_CHOICES = [(1, 'High'), (2, 'Low')]class Todo(models.Model):    content = models.CharField(max_length=100)    is_done = models.BooleanField(default=False)    owner = models.ForeignKey('auth.User')    priority = models.IntegerField(choices=PRIORITY_CHOICES,                                   default=1)

Now, let's consider a query we might want to make across this data. Say we're creating a view for the dashboard of our Todo app. We want to show all of the incomplete, high-priority Todos that exist for the currently logged in user. Here's our first stab at the code:

def dashboard(request):    todos = Todo.objects.filter(        owner=request.user    ).filter(        is_done=False    ).filter(        priority=1    )    return render(request, 'todos/list.html', {        'todos': todos,    })

(And yes, I know this can be written as request.user.todo_set.filter(is_done=False, priority=1). Remember, toy example!)

Why is this bad?

First, it's verbose. Seven lines (depending on how you prefer to deal with newlines in chained method calls) just to pull out the rows we care about. And, of course, this is just a toy example. Real-world ORM code can be much more complicated.

ORM所支持的链式操作虽然方便，但有可能使代码变得冗长

It leaks implementation details. Code that interacts with the model needs to know that there exists a property called is_done, and that it's aBooleanField. If you change the implementation (perhaps you replace theis_done boolean with a status field that can have multiple values) then this code will break.

与底层表结构的关联的太过与紧密，不便于维护和更新

It's opaque - the meaning or intent behind it is not clear at a glance (which can be summarised as "it's hard to read").

由于冗长的代码以及与表结构的紧密关联，可读性很差

Finally, it has the potential to be repetetive. Imagine you are given a new requirement: write a management command, called via cron every week, to email all users their list of incomplete, high-priority todo items. You'd have to essentially copy-and-paste these seven lines into your new script. Not very DRY.

代码的复用性也很差。

Let's summarise this with a bold claim: using low-level ORM code directly in a view is (usually) ananti-pattern.

So, how can we improve on this?

Managers and QuerySets

Before diving into solutions, we're going to take a slight detour to cover some essential concepts.

Django has two intimately-related constructs related to table-level operations: managers andquerysets.

A manager (an instance of django.db.models.manager.Manager) is described as "the interface through which database query operations are provided to Django models." A model's Manager is the gateway to table-level functionality in the ORM (model instances generally give you row-level functionality). Every model class is given a default manager, called objects.

manager是对象操作影射到数据库操作的接口。

A queryset (django.db.models.query.QuerySet) represents "a collection of objects from your database." It is essentially a lazily-evaluated abstraction of the result of a SELECT query, and can be filtered, ordered and generally manipulated to restrict or modify the set of rows it represents. It's responsible for creating and manipulating django.db.models.sql.query.Query instances, which are compiled into actual SQL queries by the database backends.

queryset是数据库中对象的集合，最大的特点就是“懒”，支持迭代以及链式操作。

Phew. Confused? While the distinction between a Manager and a QuerySet can be explained if you're deeply familiar with the internals of the ORM, it's far from intuitive, especially for beginners.

This confusion is made worse by the fact that the familiar Manager API isn't quite what it seems...

The `Manager` API is a lie

QuerySet methods are chainable. Each call to a QuerySet method (such as filter) returns a cloned version of the original queryset, ready for another method to be called. This fluent interface is part of the beauty of Django's ORM.

But the fact that Model.objects is a Manager (not a QuerySet) presents a problem: we need to start our chain of method calls on objects, but continue the chain on the resulting QuerySet.

So how is this problem solved in Django's codebase? Thus, the API lie is exposed: all of theQuerySet methods are reimplemented on the Manager. The versions of these methods on theManager simply proxy to a newly-created QuerySet via self.get_query_set():

class Manager(object):    # SNIP some housekeeping stuff..    def get_query_set(self):        return QuerySet(self.model, using=self._db)    def all(self):        return self.get_query_set()    def count(self):        return self.get_query_set().count()    def filter(self, *args, **kwargs):        return self.get_query_set().filter(*args, **kwargs)    # and so on for 100+ lines...

Manger中的api是一个“谎言”，所有的操作最终都会指向QuerySet。

To see the full horror, take a look at the Manager source code.

We'll return to this API sleight-of-hand shortly...

Back to the todo list

So, let's get back to solving our problem of cleaning up a messy query API. The approachrecommended by Django's documentation is to define custom Manager subclasses and attach them to your models.

You can either add multiple extra managers to a model, or you can redefine objects, maintaining a single manager but adding your own custom methods.

Let's try each of these approaches with our Todo application.

Approach 1: multiple custom Managers

class IncompleteTodoManager(models.Manager):    def get_query_set(self):        return super(TodoManager, self).get_query_set().filter(is_done=False)class HighPriorityTodoManager(models.Manager):    def get_query_set(self):        return super(TodoManager, self).get_query_set().filter(priority=1)class Todo(models.Model):    content = models.CharField(max_length=100)    # other fields go here..    objects = models.Manager() # the default manager    # attach our custom managers:    incomplete = models.IncompleteTodoManager()    high_priority = models.HighPriorityTodoManager()

The API this gives us looks like this:

>>> Todo.incomplete.all()>>> Todo.high_priority.all()

Unfortunately, there are several big problems with this approach.

The implementation is very verbose. You need to define an entire class for each custom piece of query functionality.
It clutters your model's namespace. Django developers are used to thinking of Model.objects as the "gateway" to the table. It's a namespace under which all table-level operations are collected. It'd be a shame to lose this clear convention.
Here's the real deal breaker: it's not chainable. There's no way of combining the managers: to get todos which are incomplete and high-priority, we're back to low-level ORM code: either Todo.incomplete.filter(priority=1)or Todo.high_priority.filter(is_done=False).

使用多个manager是封装不同逻辑的一个比较直接的想法，但这种方式具有代码冗肿，破坏原有代码中objects所带来的命名空间的简洁性，最差劲的地方在于不支持链式操作(调用方法的对象是Manager返回的确是QuerySet)。

I think these downsides completely outweigh any benefits of this approach, and having multiple managers on a model is almost always a bad idea.

Approach 2: Manager methods

So, let's try the other Django-sanctioned approach: multiple methods on a single custom Manager.

class TodoManager(models.Manager):    def incomplete(self):        return self.filter(is_done=False)    def high_priority(self):        return self.filter(priority=1)class Todo(models.Model):    content = models.CharField(max_length=100)    # other fields go here..    objects = TodoManager()

Our API now looks like this:

>>> Todo.objects.incomplete()>>> Todo.objects.high_priority()

This is better. It's much less verbose (only one class definition) and the query methods remain namespaced nicely under objects.

It's still not chainable, though. Todo.objects.incomplete() returns an ordinary QuerySet, so we can't then call Todo.objects.incomplete().high_priority(). We're stuck withTodo.objects.incomplete().filter(is_done=False). Not much use.

将所有的操作放到一个Manager中，虽然解决了代码冗肿以及命名空间的问题，但仍不支持链式操作。

Approach 3: custom QuerySet

Now we're in uncharted territory. You won't find this in Django's documentation...

class TodoQuerySet(models.query.QuerySet):    def incomplete(self):        return self.filter(is_done=False)    def high_priority(self):        return self.filter(priority=1)class TodoManager(models.Manager):    def get_query_set(self):        return TodoQuerySet(self.model, using=self._db)class Todo(models.Model):    content = models.CharField(max_length=100)    # other fields go here..    objects = TodoManager()

Here's what this looks like from the point of view of code that calls it:

>>> Todo.objects.get_query_set().incomplete()>>> Todo.objects.get_query_set().high_priority()>>> # (or)>>> Todo.objects.all().incomplete()>>> Todo.objects.all().high_priority()

We're nearly there! This is not much more verbose than Approach 2, gives the same benefits, and additionally (drumroll please...) it's chainable!

>>> Todo.objects.all().incomplete().high_priority()

However, it's still not perfect. The custom Manager is nothing more than boilerplate, and thatall() is a wart, which is annoying to type but more importantly is inconsistent - it makes our code look weird.

让manager返回一个queryset，然后在queryset里去封装这些操作，这样基本解决了以上所有的这些的问题了。但仍有一个不大不小的问题，就是all()/get_query_set()方法，看着很多余。

Approach 3a: copy Django, proxy everything

Now our discussion of the "Manager API lie" above becomes useful: we know how to fix this problem. We simply redefine all of our QuerySet methods on the Manager, and proxy them back to our custom QuerySet:

class TodoQuerySet(models.query.QuerySet):    def incomplete(self):        return self.filter(is_done=False)    def high_priority(self):        return self.filter(priority=1)class TodoManager(models.Manager):    def get_query_set(self):        return TodoQuerySet(self.model, using=self._db)    def incomplete(self):        return self.get_query_set().incomplete()    def high_priority(self):        return self.get_query_set().high_priority()

This gives us exactly the API we want:

>>> Todo.objects.incomplete().high_priority() # yay!

Except that's a lot of typing, and very un-DRY. Every time you add a new method to yourQuerySet, or change the signature of an existing method, you have to remember to make the same change on your Manager, or it won't work properly. This is a recipe for problems.

为了表面的不罗索，背地里需要罗索一下了，将所有queryset的方法在manager里重写一边，proxy

Approach 3b: django-model-utils

Python is a dynamic language. Surely we can avoid all this boilerplate? It turns out we can, with a little help from a third-party app called django-model-utils. Just run pip install django-model-utils, then..

from model_utils.managers import PassThroughManagerclass TodoQuerySet(models.query.QuerySet):    def incomplete(self):        return self.filter(is_done=False)    def high_priority(self):        return self.filter(priority=1)class Todo(models.Model):    content = models.CharField(max_length=100)    # other fields go here..    objects = PassThroughManager.for_queryset_class(TodoQuerySet)()

This is much nicer. We simply define our custom QuerySet subclass as before, and attach it to our model via the PassThroughManager class provided by django-model-utils.

终极好用的方法是使用django-model-utils，它即可解决以上提到的所有问题，而且代码还非常之简洁。

The PassThroughManager works by implementing the \_\_getattr\_\_ method, which intercepts calls to non-existing methods and automatically proxies them to the QuerySet. There's a bit of careful checking to ensure that we don't get infinite recursion on some properties (which is why I recommend using the tried-and-tested implementation supplied bydjango-model-utils rather than hand-rolling your own).

How does this help?

Remember that view code from earlier?

def dashboard(request):    todos = Todo.objects.filter(        owner=request.user    ).filter(        is_done=False    ).filter(        priority=1    )    return render(request, 'todos/list.html', {        'todos': todos,    })

With a bit of work, we could make it look something like this:

def dashboard(request):    todos = Todo.objects.for_user(        request.user    ).incomplete().high_priority()    return render(request, 'todos/list.html', {        'todos': todos,    })

Hopefully you'll agree that this second version is much simpler, clearer and more readable than the first.

Can Django help?

Ways of making this whole thing easier have been discussed on the django-dev mailing list, and there's an associated ticket. Zachary Voase proposed the following:

class TodoManager(models.Manager):    @models.querymethod    def incomplete(query):        return query.filter(is_done=False)

This single decorated method definition would make incomplete magically available on both theManager and the QuerySet.

Personally, I'm not completely convinced by the decorator-based idea. It obscures the details slightly, and feels a little "hacky". My gut feeling is that adding methods to a QuerySet subclass (rather than a Manager subclass) is a better, simpler approach.

Perhaps we could go further. By stepping back and re-examining Django's API design decisions from scratch, maybe we could make real, deep improvements. Can the distinction between Managers and QuerySets be removed (or at least clarified)?

I'm fairly sure that if a major reworking like that ever did happen, it would have to be in Django 2.0 or beyond.

So, to recap:

Using raw ORM query code in views and other high-level parts of your application is (usually) a bad idea. Instead, creating custom QuerySet APIs and attaching them to your models with aPassThroughManager from django-model-utils gives you the following benefits:

Makes code less verbose, and more robust.
Increases DRYness, raises abstraction level.
Pushes business logic into the domain model layer where it belongs.

Thanks for reading!

0 0