如何快速更新数据库中的百万条数据
来源:互联网 发布:淘宝扣12分有什么影响 编辑:程序博客网 时间:2024/05/20 07:17
环境:ubuntu 14.04,Mysql 5.6,python 2.7。
本文使用python的MySQLdb这个库,MySQL Python tutorial这个教程很赞。
MySQLDBdb处理INSERT、UPDATE、DELETE时需要显式得commit()才能使更新有效,这也意味着commit之前的更新语句是放在一个事务中的。
For databases that support transactions, the Python interface silently starts a transaction when the cursor is created. The commit() method commits the updates made using that cursor, and the rollback() method discards them. Each method starts a new transaction.
对比如下:
方式1是单线程,where使用了索引,一个更新对应一个commit。
方式2是单线程,where没有使用索引,所有更新对应一个commit。
方式3是单线程,where使用了索引,所有更新对应一个commit。
方式4使用了2个进程来更新,where使用了索引,每个进程里的所有更新对应一个commit。
方式5使用了4个进程来更新,where使用了索引,每个进程里的所有更新对应一个commit。
方式6是在一个update语句里面更新所有的数据。
建立表
mysql> CREATE TABLE `test`.`number` ( `id` INT NOT NULL COMMENT '', `num` INT NULL COMMENT '', PRIMARY KEY (`id`) COMMENT '');mysql> show create table number \G*************************** 1. row *************************** Table: numberCreate Table: CREATE TABLE `number` ( `id` int(11) NOT NULL, `num` int(11) DEFAULT NULL, PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=latin11 row in set (0.00 sec)
使用mysqldb插入数据(100*000条)
# coding: UTF-8__author__ = 'letian'import MySQLdbimport timehost='127.0.0.1'user='root'passwd='123456'db='test'port = 3306start_time = time.time()conn = MySQLdb.connect(host, user, passwd, db, port)cur = conn.cursor()for x in xrange(100*10000): # 0 到 999999 print x sql = "INSERT INTO number VALUES(%s,%s)" % (x, x) cur.execute(sql)conn.commit()end_time = time.time()print '用时:', end_time - start_time
运行结果:
...999997999998999999用时:219.332565069
挺快。
看下数据:
mysql> select count(*) from number;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (0.60 sec)mysql> select * from number limit 3;+----+------+| id | num |+----+------+| 0 | 0 || 1 | 1 || 2 | 2 |+----+------+3 rows in set (0.00 sec)
更新方式1:单线程,使用索引,多个commmit,更新1000条
# coding: UTF-8__author__ = 'letian'import MySQLdbimport timehost='127.0.0.1'user='root'passwd='123456'db='test'port = 3306start_time = time.time()conn = MySQLdb.connect(host, user, passwd, db, port)cur = conn.cursor()for x in xrange(1000): print x sql = "UPDATE number SET num=%s where id=%s" % (x+1, x) cur.execute(sql) conn.commit()cur.close()conn.close()end_time = time.time()print '用时:', end_time - start_time
输出:
...997998999用时:57.8318688869
1000条用时约60秒。
为什么慢?
先explain一下:
mysql> explain update number set num=3 where id=2 \G*************************** 1. row *************************** id: 1 select_type: SIMPLE table: number type: rangepossible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: Using where1 row in set (0.06 sec)
sql语句没什么问题。
再看一下磁盘IO:
letian $ iostat -d -k 1 100Linux 3.13.0-37-generic (myhost) 06/22/2015 _x86_64_ (4 CPU)............Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 57.00 4.00 468.00 4 468Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 71.00 28.00 404.00 28 404Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 71.00 28.00 608.00 28 608Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 61.00 4.00 492.00 4 492............Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 73.00 0.00 536.00 0 536Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 69.00 4.00 516.00 4 516............Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 61.00 4.00 472.00 4 472Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 63.00 4.00 452.00 4 452............Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 16.00 0.00 1804.00 0 1804Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 3.00 0.00 16.00 0 16Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 0.00 0.00 0.00 0 0Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 2.00 0.00 12.00 0 12
tps
代表每秒IO传输的次数,与下面的更新方式4
对比一下,可以看出当前的更新方式IO较多。(为什么呢?移步Why does TRANSACTION / COMMIT improve performance so much with PHP/MySQL (InnoDB)?)
更新方式2,单线程,不使用索引,一个commit,更新100条
# coding: UTF-8__author__ = 'letian'import MySQLdbimport timehost='127.0.0.1'user='root'passwd='123456'db='test'port = 3306start_time = time.time()conn = MySQLdb.connect(host, user, passwd, db, port)cur = conn.cursor()for x in xrange(100): print x sql = "UPDATE number SET num=%s where num=%s" % (x*2, x) cur.execute(sql)conn.commit()cur.close()conn.close()end_time = time.time()print '用时:', end_time - start_time
输出:
...979899用时: 89.695207119
为什么这么慢:
mysql> explain update number set num=4 where num=2 \G*************************** 1. row *************************** id: 1 select_type: SIMPLE table: number type: indexpossible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 979800 Extra: Using where1 row in set (0.05 sec)
可以看到,每次update几乎要扫全表(rows: 979800)。
更新方式3:单线程,使用索引,一个commit,更新40×10000条
# coding: UTF-8__author__ = 'letian'import MySQLdbimport timehost='127.0.0.1'user='root'passwd='123456'db='test'port = 3306start_time = time.time()conn = MySQLdb.connect(host, user, passwd, db, port)cur = conn.cursor()for x in xrange(40*10000): # 0 到 999999 print x sql = "UPDATE number SET num=%s where id=%s" % (x+1, x) cur.execute(sql)conn.commit()end_time = time.time()print '用时:', end_time - start_time
输出:
...399997399998399999用时:92.9413559437
查看磁盘IO:
letian $ iostat -d -k 1 100Linux 3.13.0-37-generic (myhost) 06/22/2015 _x86_64_ (4 CPU)Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 20.88 288.50 283.18 2458129 2412772Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 18.00 116.00 328.00 116 328Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 17.00 8.00 548.00 8 548Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 11.00 12.00 396.00 12 396Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 10.00 4.00 488.00 4 488............Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 10.00 4.00 460.00 4 460Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 10.00 4.00 448.00 4 448Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 10.00 4.00 464.00 4 464Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 10.00 4.00 456.00 4 456Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 9.00 4.00 452.00 4 452Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 9.00 0.00 140.00 0 140Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 15.00 4.00 2448.00 4 2448Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 43.00 144.00 2544.00 144 2544............Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 9.00 0.00 516.00 0 516Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 10.00 4.00 512.00 4 512Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 12.00 4.00 540.00 4 540Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 14.00 4.00 2288.00 4 2288Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 10.00 4.00 528.00 4 528Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 6.00 0.00 184.00 0 184............Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 14.00 0.00 3868.00 0 3868Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 15.00 0.00 2604.00 0 2604Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 15.00 0.00 1516.00 0 1516Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 3.00 0.00 16.00 0 16Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 12.00 0.00 68.00 0 68Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 4.00 0.00 28.00 0 28Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 0.00 0.00 0.00 0 0Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 0.00 0.00 0.00 0 0Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 0.00 0.00 0.00 0 0
更新方式4:2个进程,一个进程里一个commit,共更新40*10000条
两个进程:
# coding: UTF-8__author__ = 'letian'import MySQLdbimport timeimport multiprocessinghost='127.0.0.1'user='root'passwd='123456'db='test'port = 3306start_time = time.time()def update(int_iter): conn = MySQLdb.connect(host, user, passwd, db, port) cur = conn.cursor() for x in int_iter: print x sql = "UPDATE number SET num=%s where id=%s" % (x+5, x) cur.execute(sql) conn.commit() cur.close() conn.close()pool = multiprocessing.Pool(processes = 2)pool.apply_async(update, (xrange(20*10000), ))pool.apply_async(update, (xrange(20*10000, 40*10000), ))pool.close()pool.join()end_time = time.time()print '用时:', end_time - start_time
输出:
...199998199999用时: 47.9561839104
看下数据:
mysql> select * from number where id=1;+----+------+| id | num |+----+------+| 1 | 6 |+----+------+1 row in set (0.00 sec)mysql> select * from number where id=399999;+--------+--------+| id | num |+--------+--------+| 399999 | 400004 |+--------+--------+1 row in set (0.00 sec)
更新方式5:4个进程,一个进程里一个commit,共更新40*10000条
# coding: UTF-8__author__ = 'letian'import MySQLdbimport timeimport multiprocessinghost='127.0.0.1'user='root'passwd='123456'db='test'port = 3306start_time = time.time()def update(int_iter): conn = MySQLdb.connect(host, user, passwd, db, port) cur = conn.cursor() for x in int_iter: print x sql = "UPDATE number SET num=%s where id=%s" % (x+10, x) cur.execute(sql) conn.commit() cur.close() conn.close()pool = multiprocessing.Pool(processes = 4)pool.apply_async(update, (xrange(10*10000), ))pool.apply_async(update, (xrange(10*10000, 20*10000), ))pool.apply_async(update, (xrange(20*10000, 30*10000), ))pool.apply_async(update, (xrange(30*10000, 40*10000), ))pool.close()pool.join()end_time = time.time()print '用时:', end_time - start_time
运行结果:
299996299997299998299999用时: 28.4070010185
查看更新效果:
mysql> select * from number where id=1;+----+------+| id | num |+----+------+| 1 | 11 |+----+------+1 row in set (0.00 sec)mysql> select * from number where id=399999;+--------+--------+| id | num |+--------+--------+| 399999 | 400009 |+--------+--------+1 row in set (0.00 sec)
更新方式6:一次更新多行,更新40*10000条
# coding: UTF-8__author__ = 'letian'import MySQLdbimport timehost='127.0.0.1'user='root'passwd='123456'db='test'port = 3306start_time = time.time()conn = MySQLdb.connect(host, user, passwd, db, port)cur = conn.cursor()sql = "UPDATE number SET num=id+100 where id<%s" % (40*10000, )cur.execute(sql)conn.commit()cur.close()conn.close()end_time = time.time()print '用时:', end_time - start_time
输出:
用时: 3.27281808853
查看更新效果:
mysql> select * from number where id=1;+----+------+| id | num |+----+------+| 1 | 101 |+----+------+1 row in set (0.00 sec)mysql> select * from number where id=399999;+--------+--------+| id | num |+--------+--------+| 399999 | 400099 |+--------+--------+1 row in set (0.00 sec)
分析使用的sql语句:
mysql> explain UPDATE number SET num=id+100 where id<400000 \G*************************** 1. row *************************** id: 1 select_type: SIMPLE table: number type: rangepossible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 489900 Extra: Using where1 row in set (0.00 sec)
这种方式非常快,灵活性不够。
如果遇到这样一个场景:将id为0、1、2的num置为2,其他全部置为4,可以先全部置为4,然后在更新id为0、1、2的记录,这样速度会很快。
- 如何快速更新数据库中的百万条数据
- 如何利用jdbc快速插入百万条数据
- DataTable 快速导入数据库——百万条数据只需几秒!!(最新,添加了委托)
- poi9处理excel中的百万条数据
- 百万条数据分页
- 百万条数据分页
- 百万条数据分页
- 如何在地图上打点数百万条数据
- load data infile将excel文件中的数百万条数据在1分钟内导入数据库
- load data infile将excel文件中的数百万条数据在1分钟内导入数据库
- Magento-如何新增、更新或者删除数据库中的数据
- 百万条数据批量导入
- 百万条数据查询优化
- 查询百万级条数据
- 数据库插入百万数据
- 百万数据-优化数据库.
- 算法,PHP取数据库中百万条数据中随机20条记录
- 百万级别数据,数据库Mysql,Mongodb,Hbase如何选择?
- Sitemesh 3 的使用及配置
- Jenkins中使用过的Msbuild编译bat
- IO编程
- 编程之美2.7节求最大公约数问题
- 单例模式(一)
- 如何快速更新数据库中的百万条数据
- 用opengl实现轨迹圆|经典程序
- C/C++复习:向量的数量积
- 朴素贝叶斯的三个常用模型:高斯、多项式、伯努利
- Android SDK更新不了问题解决
- 设计模式之---中介者模式(Mediator Design Pattern)
- maven/gradle 打包后自动上传到nexus仓库---学习笔记
- Juju's lessons
- 算数题