Python高级编程之数据库sqlite3（二）

来源：互联网发布：darma 知乎编辑：程序博客网时间：2024/05/22 15:28

1. 事务

关系型数据库的关键特性之一是使用事务（transaction）维护一致的内部状态。启用事务时，在提交结果并刷新输出到真正的数据之前，可以通过一个连接完成多个变更，而不会影响任何其他用户。

保留变更

不论通过插入（insert）还是更新（update）语句改变数据库，都需要显式地调用commit()保存变更。这个要求为应用提供了一个机会，可以将多个变更一同完成，使它们以一种“原子”方式保存而不是增量保存，这样就可以避免同事连接到数据库的不同客户只看到部分更新的情况。可以利用一个使用了多个数据库连接的程序来查看调用commit()的效果。用第一个连接插入一个新行，然后两次尝试使用不同的连接读回这个数据行。

import sqlite3db_filename = 'todo.db'def show_projects(conn):    cursor = conn.cursor()    cursor.execute('select name, description from project')    for name, desc in cursor.fetchall():        print '  ', name    returnwith sqlite3.connect(db_filename) as conn1:    print 'Befor changes:'    show_projects(conn1)     # Insert in one cursor    cursor1 = conn1.cursor()    cursor1.execute("""    insert into project(name, description, deadling)    values('virtualenvwrapper', 'Virtualenv Extension', '2011-01-01')    """)     print '\nAfter changes in conn1:'    show_projects(conn1)     # Select from another connection, without committing first    print '\nBefor commit:'    with sqlite3.connect(db_filenames) as conn2:        shwo_project(conn2)     # Commit then select from another connection    conn1.commit()    print '\nAfter commit:'    with sqlite3.connect(db_filename) as conn3:        show_projects(conn3)

再conn1中变更如果不执行commit()，变更只有在conn1才能看到，其他的连接是看不到的。只有当执行commit()后，其他连接才能看到变更。

$ python transaction_commit.pyBefor changes:   pymotwAfter changes in conn1:   pymotw   virtualenvwrapperBefor commit:   pymotwAfter commit:   pymotw   virtualenvwrapper

丢弃变更

可以使用rollback()完全丢弃未提交的变更。commit()和rollback()方法通常在同一个try:exeption块的不同部分调用，有错误时就会触发回滚。

import sqlite3db_filename = 'todo.db'def show_projects(conn):    cursor = conn.cursor()    cursor.execute('select name, description from project')    for name, desc in cursor.fetchall():        print '    ', name    returnwith sqlite3.connect(db_filename) as conn:    print 'Before changes:'    show_projects(conn)    try:        # Insert        cursor = conn.cursor()        cursor.execute("""delete from project        where name = 'virtualenvwrapper'        """)        # Show the settings        print '\nAfter delete:'        show_projects(conn)        # Pretend the processing caused an error        raise RuntimeError('simulated error')    except Exception, err:        # Discard the changes        print 'ERROR:', err        conn.rollback()    else:        #Save the changes        conn.commit()    # Show the results    print '\nAfter rollback:'    show_projects(conn)

调用rollback()后，对数据库的修改不复存在。

$ python transaction_rollback.py Before changes:     pymotw     virtualenvwrapperAfter delete:     pymotwERROR: simulated errorAfter rollback:     pymotw     virtualenvwrapper

2. 隔离级别

sqlit3支持3种加锁模式，也称为隔离级别（isolation level），这会控制使用任何技术避免连接之间不兼容的变更。打开一个连接时可以传入一个字符串作为isolation_level参数来设置隔离级别，所以不同的连接可以使用不同的隔离级别值。

下面展示了使用同一数据库的不同连接时，不同的隔离级别对于线程中事件的顺序会有什么影响。这里创建了4个线程。两个线程会更新现有的行，将变更写入数据库。另外两个线程尝试从task表读取所有的行。

import loggingimport sqlite3import sysimport threadingimport timelogging.basicConfig(    level = logging.DEBUG,    format = '%(asctime)s (%(threadName)-10s) %(message)s',    )db_filename = 'todo.db'isolation_level = sys.argv[1]def writer():    #my_name = threading.currentThread().name    with sqlite3.connect(db_filename,            isolation_level = isolation_level) as conn:        cursor = conn.cursor()        cursor.execute('update task set priority = priority + 1')        logging.debug('waiting to synchronize')        ready.wait() # synchronize threads        logging.debug('PAUSINE')        time.sleep(1)        conn.commit()        logging.debug('CHANGES COMMITTED')    returndef reader():    #my_name = threading.currentThread().name    with sqlite3.connect(db_filename,            isolation_level = isolation_level) as conn:        cursor = conn.cursor()        logging.debug('waiting to synchronize')        ready.wait() # synchronize threads        logging.debug('wait over')        cursor.execute('select * from task')        logging.debug('SELECT EXECUTED')        results = cursor.fetchall()        logging.debug('result fetched')    returnif __name__ == '__main__':    ready = threading.Event()    threads = [            threading.Thread(name='Reader 1', target = reader),            threading.Thread(name='Reader 2', target = reader),            threading.Thread(name='Writer 1', target = writer),            threading.Thread(name='Writer 2', target = writer),            ]    [ t.start() for t in threads ]    time.sleep(1)    logging.debug('setting ready')    ready.set()    [ t.join() for t in threads ]

这些线程使用threading模块的一个Event完成同步。writer()函数连接数据库，并完成数据库修改，不过在事件触发前并不提交。reader()函数连接数据库，然后等待查询数据库，直到出现同步事件。

延迟

默认的隔离级别是DEFERRED。使用延迟（Deferred）模式会锁定数据库，但只是在修改真正开始时锁定一次。前面的所有例子都使用了延迟模式。

$ python isolation_levels.py DEFERRED2015-01-08 23:17:29,144 (Reader 1  ) waiting to synchronize2015-01-08 23:17:29,144 (Reader 2  ) waiting to synchronize2015-01-08 23:17:29,145 (Writer 1  ) waiting to synchronize2015-01-08 23:17:30,146 (MainThread) setting ready2015-01-08 23:17:30,147 (Reader 1  ) wait over2015-01-08 23:17:30,147 (Reader 1  ) SELECT EXECUTED2015-01-08 23:17:30,147 (Reader 1  ) result fetched2015-01-08 23:17:30,147 (Reader 2  ) wait over2015-01-08 23:17:30,148 (Reader 2  ) SELECT EXECUTED2015-01-08 23:17:30,148 (Reader 2  ) result fetched2015-01-08 23:17:30,148 (Writer 1  ) PAUSINE2015-01-08 23:17:31,155 (Writer 1  ) CHANGES COMMITTED2015-01-08 23:17:31,178 (Writer 2  ) waiting to synchronize2015-01-08 23:17:31,178 (Writer 2  ) PAUSINE2015-01-08 23:17:32,183 (Writer 2  ) CHANGES COMMITTED

立即

采用立即（Immediate）模式时，修改一开始时就会锁定数据库，从而在事务提交之前避免其他游标修改数据库。如果数据库有复杂的写操作，但是阅读器比书写器更多，这种模式就非常合适，因为事务进行中不会阻塞阅读器。

$ python isolation_levels.py IMMEDIATE2015-01-08 23:41:21,768 (Reader 2  ) waiting to synchronize2015-01-08 23:41:21,769 (Writer 1  ) waiting to synchronize2015-01-08 23:41:21,769 (Reader 1  ) waiting to synchronize2015-01-08 23:41:22,769 (MainThread) setting ready2015-01-08 23:41:22,769 (Reader 2  ) wait over2015-01-08 23:41:22,770 (Reader 2  ) SELECT EXECUTED2015-01-08 23:41:22,770 (Reader 2  ) result fetched2015-01-08 23:41:22,770 (Writer 1  ) PAUSINE2015-01-08 23:41:22,770 (Reader 1  ) wait over2015-01-08 23:41:22,770 (Reader 1  ) SELECT EXECUTED2015-01-08 23:41:22,771 (Reader 1  ) result fetched2015-01-08 23:41:23,776 (Writer 1  ) CHANGES COMMITTED2015-01-08 23:41:23,801 (Writer 2  ) waiting to synchronize2015-01-08 23:41:23,801 (Writer 2  ) PAUSINE2015-01-08 23:41:24,807 (Writer 2  ) CHANGES COMMITTED

互斥

互斥（Exclusive）模式会对所有阅读器和书写器锁定数据库。如果数据库性能很重要，这种情况下就要限制使用这种模式，因为每个互斥的连接都会阻塞所有其他用户。

$ python isolation_levels.py EXCLUSIVE2015-01-09 02:52:04,412 (Reader 1  ) waiting to synchronize2015-01-09 02:52:04,412 (Reader 2  ) waiting to synchronize2015-01-09 02:52:04,413 (Writer 1  ) waiting to synchronize2015-01-09 02:52:05,415 (MainThread) setting ready2015-01-09 02:52:05,415 (Reader 1  ) wait over2015-01-09 02:52:05,415 (Reader 2  ) wait over2015-01-09 02:52:05,415 (Writer 1  ) PAUSINE2015-01-09 02:52:06,423 (Writer 1  ) CHANGES COMMITTED2015-01-09 02:52:06,446 (Reader 1  ) SELECT EXECUTED2015-01-09 02:52:06,446 (Reader 1  ) result fetched2015-01-09 02:52:06,446 (Reader 2  ) SELECT EXECUTED2015-01-09 02:52:06,446 (Reader 2  ) result fetched2015-01-09 02:52:06,447 (Writer 2  ) waiting to synchronize2015-01-09 02:52:06,447 (Writer 2  ) PAUSINE2015-01-09 02:52:07,453 (Writer 2  ) CHANGES COMMITTED

由于第一个书写器已经开始修改，所以阅读器和第二个书写器会阻塞，直到第一个书写器提交。sleep()调用在书写器线程中引入一个人为的延迟，已强制其他连接已阻塞这一事实。

自动提交

连接的isolation_level参数还可以设置参数为None，以启用自动提交（autocommit）模式。启用自动提交时，每个execute()调用会在语句完成时自动提交。自动提交模式很适合简短的事务，如向一个表插入少量数据。数据库锁定时间尽可能短，所以线程间竞争的可能性小。

autocommit.py中删除了commit()的显示调用，并向隔离级别设置为None，不过除此之外，其他内容都与isolation_levels.py相同。但是输出是不同的，因为两个书写器线程会在阅读器开始查询之前完成工作。

$ python autocommit.py None2015-01-10 07:11:58,890 (Reader 1  ) waiting to synchronize2015-01-10 07:11:58,891 (Reader 2  ) waiting to synchronize2015-01-10 07:11:58,907 (Writer 1  ) waiting to synchronize2015-01-10 07:11:58,913 (Writer 2  ) waiting to synchronize2015-01-10 07:11:59,893 (MainThread) setting ready2015-01-10 07:11:59,893 (Reader 1  ) wait over2015-01-10 07:11:59,894 (Reader 1  ) SELECT EXECUTED2015-01-10 07:11:59,894 (Reader 1  ) result fetched2015-01-10 07:11:59,893 (Reader 2  ) wait over2015-01-10 07:11:59,895 (Reader 2  ) SELECT EXECUTED2015-01-10 07:11:59,895 (Reader 2  ) result fetched2015-01-10 07:11:59,895 (Writer 1  ) PAUSINE2015-01-10 07:11:59,895 (Writer 2  ) PAUSINE

3. 内存中数据库

SQlite支持在RAM中管理整个数据库，而不是依赖一个磁盘文件。如果测试运行之间不需要保存数据库，或者要尝试一个模式或者其他数据库特性，此时内存中数据库对于自动测试会很有用。要打开一个内存中数据库，创建Connection时可以使用串':memory:'而不是一个文件名。每个':memory:'连接会创建一个单独的数据库实例，所以一个连接中游标所做的修改不会影响其他连接。

4. 导出数据库类型

内存数据库的内容可以使用Connection的iterdump()方法保存。iterdump()方法返回迭代器生成一系列字符串，这些字符串将共同构造相应的SQL指令来重新创建数据库的状态。

import sqlite3schema_filename = 'todo_schema.sql'with sqlite3.connect(':memory:') as conn:    conn.row_factory = sqlite3.Row    print 'Creating schema'    with open(schema_filename, 'rt') as f:        schema = f.read()    conn.executescript(schema)    print "Inserting initial data"    conn.execute("""    insert  into project (name, description, deadline)    values('pymotw','Python Module of the Week', '2015-01-10')    """)    data = [    ('write about select', 'done', '2015-01-11', 'pymotw'),    ('write about random', 'waiting', '2015-01-12', 'pymotw'),    ('write about sqlite3', 'active', '2015-01-10', 'pymotw'),    ]    conn.executemany("""    insert into task (details, status, deadline, project)    values (?, ?, ?, ?)    """, data)    print 'Dumping:'    for text in conn.iterdump():        print text

iterdump()也适用于保存到文件的数据库，不过对于未保存的数据库最为有用。这里对输出做了一些编辑器调整，从而使其保证语法正确的前提下适合在页面中显示。

Creating schemaInserting initial dataDumping:BEGIN TRANSACTION;CREATE TABLE project(        name text primary key,        description text,        deadline date        );INSERT INTO "project" VALUES('pymotw','Python Module of the Week','2015-01-10');CREATE TABLE task(        id integer primary key autoincrement not null,        priority integer default 1,        details text,        status text,        deadline date,        completed_on date,        project text not null references project(name)        );INSERT INTO "task" VALUES(1,1,'write about select','done','2015-01-11',NULL,'pymotw');INSERT INTO "task" VALUES(2,1,'write about random','waiting','2015-01-12',NULL,'pymotw');INSERT INTO "task" VALUES(3,1,'write about sqlite3','active','2015-01-10',NULL,'pymotw');DELETE FROM "sqlite_sequence";INSERT INTO "sqlite_sequence" VALUES('task',3);COMMIT;

查看todo_schema.sql，其内容如下：

create table project(        name text primary key,        description text,        deadline date        );create table task(        id integer primary key autoincrement not null,        priority integer default 1,        details text,        status text,        deadline date,        completed_on date,        project text not null references project(name)        );

5. SQL中使用Python函数

SQL语法支持在查询中调用函数，可以再列表中调用，也可以再select语句中的where字句中调用。利用这个特性，从查询返回数据库之前可以先处理数据，可以用于不同格式之间转换，完成一些计算（否则使用纯SQL会很麻烦），以重用应用代码。

import sqlite3db_filename = 'todo.db'def encrypt(s):    print 'Encrypting %r' % s    return s.encode('rot-13')def decrypt(s):    print 'Decrypting %r' % s    return s.encode('rot-13')with sqlite3.connect(db_filename) as conn:    conn.create_function('encrypt',1, encrypt)    conn.create_function('decrypt', 1, decrypt)    cursor = conn.cursor()    # Raw values    print 'Original values:'    query = 'select id, details from task'    cursor.execute(query)    for row in cursor.fetchall():        print row    print '\nEncrypting...'    query = 'update task set details = encrypt(details)'    cursor.execute(query)    for row in cursor.fetchall():        print row    print '\nDecrypting in query...'    query = 'select id, decrypt(details) from task'    cursor.execute(query)    for row in cursor.fetchall():        print row

函数使用Connection的create_function()方法提供。参数包括函数名（即SQL中使用函数名）、函数所取的参数个数，以及要提供的Python函数。

$ python create_function.py Original values:(1, u'write about select')(2, u'write about random')(3, u'write about sqlite3')(4, u'finish revieing markup')(5, u'revise chapter intros')(6, u'subtitle')Encrypting...Encrypting u'write about select'Encrypting u'write about random'Encrypting u'write about sqlite3'Encrypting u'finish revieing markup'Encrypting u'revise chapter intros'Encrypting u'subtitle'Decrypting in query...Decrypting u'jevgr nobhg fryrpg'Decrypting u'jevgr nobhg enaqbz'Decrypting u'jevgr nobhg fdyvgr3'Decrypting u'svavfu erivrvat znexhc'Decrypting u'erivfr puncgre vagebf'Decrypting u'fhogvgyr'(1, u'write about select')(2, u'write about random')(3, u'write about sqlite3')(4, u'finish revieing markup')(5, u'revise chapter intros')(6, u'subtitle')

6. 定制聚类

聚类函数会收集多个单独的数据，并以某种方式汇总。avg()（取平均值）、min()、max()和count()都是内置聚类函数的例子。

sqlite3使用的聚集器API定义一个包含两个方法的类。处理查询时会对各个数值分别调用一次stop()方法。finalize()方法在查询的最后调用一次，并返回聚集值。下面的例子为mode实现了一个聚集集。它会返回输入中出现最频繁的值。

import sqlite3import collectionsdb_filename = 'todo.db'class Mode(object):    def __init__(self):        self.counter = collections.Counter()    def step(self,value):        print 'step(%r)' % value        self.counter[value] += 1    def finalize(self):        result, count = self.counter.most_common(1)[0]        print 'finalize() -> %r (%d times)' % (result, count)        return resultwith sqlite3.connect(db_filename) as conn:    conn.create_aggregate('mode', 1, Mode)    cursor = conn.cursor()    cursor.execute("""            select mode(deadline) from task where project = 'pymotw'            """)    row = cursor.fetchone()    print 'mode(deadline) is:', row[0]

聚类器用Connection的create_aggregate()方法注册。参数包括函数名（即SQL中使用的函数名）、step()方法所取的参数个数，以及要使用的类。

$python create_aggregate.pystep(u'2014-12-05')step(u'2015-01-01')step(u'2015-02-01')step(u'2014-10-02')step(u'2014-11-02')step(u'2014-11-02')finalize() -> u'2014-11-02' (2 times)mode(deadline) is: 2014-11-02

7. 定制排序

对比（collation）是一个比较函数，在SQL查询的order by部分使用。对于SQLite无法在内部排序的数据类型，可以使用它来定制比对来比较。例如，我们需要一个定制比对来保存custom_type.py中的pickle对象排序。

import sqlite3try:    import cPickle as pickleexcept:    import pickledb_filename = 'todo.db'def adapter_func(obj):    return pickle.dumps(obj)def converter_func(data):    return pickle.loads(data)class MyObj(object):    def  __init__(self, arg):        self.arg = arg    def __str__(self):        return 'MyObj(%r)' % self.arg    def __cmp__(self, other):        return cmp(self.arg, other.arg)# Register the functions for mainpulating the type.sqlite3.register_adapter(MyObj, adapter_func)sqlite3.register_converter("MyObj", converter_func)def collation_func(a, b):    a_obj = converter_func(a)    b_obj = converter_func(b)    print 'collation_func(%s, %s)' % (a_obj, b_obj)    return cmp(a_obj, b_obj)with sqlite3.connect(db_filename,        detect_types = sqlite3.PARSE_DECLTYPES,        ) as conn:    # Define the collation    conn.create_collation('unpickle', collation_func)    # Clear the table and insert new values    conn.execute('delete from obj')    conn.executemany('insert into obj (data) values (?)',    [(MyObj(x),) for x in xrange(5,0,-1)],)    # Query the database for the objects just saved    print 'Querying:'    cursor = conn.cursor()    cursor.execute("""    select id, data from obj order by data collate unpickle    """)    for obj_id, obj in cursor.fetchall():        print obj_id, obj

对比函数的参数是字节串，所以在完成比较之前必须解除pickle，并转化为为MyObj实例，执行结果为:

$python create_collation.py Querying:collation_func(MyObj(2), MyObj(1))collation_func(MyObj(4), MyObj(3))collation_func(MyObj(3), MyObj(1))collation_func(MyObj(3), MyObj(2))collation_func(MyObj(5), MyObj(1))collation_func(MyObj(5), MyObj(2))collation_func(MyObj(5), MyObj(3))collation_func(MyObj(5), MyObj(4))7 MyObj(1)6 MyObj(2)5 MyObj(3)4 MyObj(4)3 MyObj(5)

8. 线程和连接共享

处于历史原因，由于必须使用老版本的SQLite，Connection对象不能在线程间共享。每个线程必须创建自己的数据库连接。

import sqlite3import sysimport threadingimport timedb_filename = 'todo.db'isolation_level = None # autocommit modedef reader(conn):    my_name = threading.currentThread().name    print 'Starting thread'    try:        cursor = conn.cursor()        cursor.execute('select * from task')        results = cursor.fetchall()        print 'results fetched'    except Exception, err:        print 'ERROR:', err    returnif __name__ == '__main__':    with sqlite3.connect(db_filename, isolation_level=isolation_level,) as conn:        t = threading.Thread(name='Reader 1', target = reader, args=(conn,),)        t.start()        t.join()

如果试图在线程之间共享一个连接，会导致一个异常：

Starting threadERROR: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140473086383872 and this is thread id 140473051047680

9.限制对数据的访问

与其他更大的关系数据库相比，尽管SQLite没有用户访问控制，但是确实提供了一种机制来限制访问。每个连接可以安装一个授权函数（authorizer function），运行时可以根据所需的原则来批准或拒绝访问列。这个授权函数会在解析SQL语句时调用，将传入5个参数。第一个参数是动作码，指示所完成的操作类型（读、写、删除等等）。其余的参数则取决于动作码。对于SQLITE_READ操作，这4个参数分别是表名、列名、SQL语句访问出现的位置（主查询、触发器等等）和None。

import sqlite3db_filename = 'todo.db'def authorizer_func(action, table, column, sql_location, ignore):    print '\nauthorizer_func(%s, %s, %s, %s, %s)' % \            (action, table, column, sql_location, ignore)    response = sqlite3.SQLITE_OK  # be permissive by default    if action == sqlite3.SQLITE_SELECT:        print 'requesting permission to run a select statement'        response = sqlite3.SQLITE_OK    elif action == sqlite3.SQLITE_READ:        print 'requesting access to column %s.%s from %s' % \                (table, column, sql_location)        if column == 'details':            print '   ignoring details column'            response = sqlite3.SQLITE_IGNORE        elif column == 'priority':            print '    preventing access to priority column'            response = sqlite3.SQLITE_DENY    return  responsewith sqlite3.connect(db_filename) as conn:    conn.row_factory = sqlite3.Row    conn.set_authorizer(authorizer_func)    print 'Using SQLITE_IGNORE to mask a column value:'    cursor = conn.cursor()    cursor.execute(" select id, details from task where project='pymotw' ")    for row in cursor.fetchall():        print row['id'], row['details']    print '\nUsing SQLITE_DENY to deny access to a column:'    cursor.execute("""    select id, priority from task where project='pymotw'    """)    for row in cursor.fetchall():        print row['id'], row['priority']

sqlite3中提供了一些矿用的动作码，它们作为常量提供，名字前面都有前缀SQLITE_。每一类SQL语句可以加标志，可以控制对单个列的访问。SQLITE_OK表示能够正常返回列值，而SQLITE_IGNORE则返回查询列为NULL，SQLITE_DENY将限制对该列的访问。该程序中，默认为SQLITE_OK，即能够正常返回，对于detail列，返回NULL，对于priority列的查询，则禁止。

$python set_authorizer.py Using SQLITE_IGNORE to mask a column value:authorizer_func(21, None, None, None, None)requesting permission to run a select statementauthorizer_func(20, task, id, main, None)requesting access to column task.id from mainauthorizer_func(20, task, details, main, None)requesting access to column task.details from main   ignoring details columnauthorizer_func(20, task, project, main, None)requesting access to column task.project from main1 None2 None3 None4 None5 None6 NoneUsing SQLITE_DENY to deny access to a column:authorizer_func(21, None, None, None, None)requesting permission to run a select statementauthorizer_func(20, task, id, main, None)requesting access to column task.id from mainauthorizer_func(20, task, priority, main, None)requesting access to column task.priority from main    preventing access to priority columnTraceback (most recent call last):  File "set_authorizer.py", line 37, in <module>    """)sqlite3.DatabaseError: access to task.priority is prohibited

0 0