python爬虫<在Pipeline创建数据连接池批量写入MySQL>
来源:互联网 发布:局域网控制桌面软件 编辑:程序博客网 时间:2024/06/16 11:21
以下是连接池的参数说明:
def __init__(self, creator, mincached=0, maxcached=0, maxshared=0, maxconnections=0, blocking=False, maxusage=None, setsession=None, reset=True, failures=None, ping=1, *args, **kwargs): """Set up the DB-API 2 connection pool. creator: either an arbitrary function returning new DB-API 2 connection objects or a DB-API 2 compliant database module mincached: initial number of idle connections in the pool (0 means no connections are made at startup) maxcached: maximum number of idle connections in the pool (0 or None means unlimited pool size) maxshared: maximum number of shared connections (0 or None means all connections are dedicated) When this maximum number is reached, connections are shared if they have been requested as shareable. maxconnections: maximum number of connections generally allowed (0 or None means an arbitrary number of connections) blocking: determines behavior when exceeding the maximum (if this is set to true, block and wait until the number of connections decreases, otherwise an error will be reported) maxusage: maximum number of reuses of a single connection (0 or None means unlimited reuse) When this maximum usage number of the connection is reached, the connection is automatically reset (closed and reopened). setsession: optional list of SQL commands that may serve to prepare the session, e.g. ["set datestyle to ...", "set time zone ..."] reset: how connections should be reset when returned to the pool (False or None to rollback transcations started with begin(), True to always issue a rollback for safety's sake) failures: an optional exception class or a tuple of exception classes for which the connection failover mechanism shall be applied, if the default (OperationalError, InternalError) is not adequate ping: determines when the connection should be checked with ping() (0 = None = never, 1 = default = whenever fetched from the pool, 2 = when a cursor is created, 4 = when a query is executed, 7 = always, and all other bit combinations of these values) args, kwargs: the parameters that shall be passed to the creator function or the connection constructor of the DB-API 2 module """
在 pipeline中的代码,以下代码是以一千条数据为一批,在spider开启的时候创建连接池,spider关闭的时候,将可能不足一批的数据写入数据库同时关闭连接池:
# -*- coding: utf-8 -*-import sysimport MySQLdbdefault_encoding = 'utf-8'if sys.getdefaultencoding() != default_encoding: reload(sys) sys.setdefaultencoding(default_encoding)from DBUtils.PooledDB import PooledDBclass MySQLStorePipeline(object): pgs = [] def open_spider(self, spider): self.pool = PooledDB(creator=MySQLdb, maxcached=5,maxshared=5, host='localhost', user='root', passwd='root', db='vboxdb', port=3306, charset="utf8") self.conn = self.pool.connection() self.cursor = self.conn.cursor() # 清空表: self.cursor.execute("truncate table epg") self.conn.commit() # 批量插入 def bulk_insert_to_mysql(self, bulkdata): try: print "inserting data in batch--->>>>>", len(self.pgs) sql = """INSERT INTO epg (date, channel, time, pg) VALUES (%s, %s, %s, %s)""" self.cursor.executemany(sql, bulkdata) self.conn.commit() except: self.conn.rollback() def process_item(self, item, spider): # print item['date'], item['channel'], item['time'], item['pg'] self.pgs.append((item['date'], item['channel'], item['time'], item['pg'])) if len(self.pgs) == 1000: self.bulk_insert_to_mysql(self.pgs) # 清空缓冲区 del self.pgs[:] return item # spider结束 def close_spider(self, spider): print "closing spider,last commit", len(self.pgs) self.bulk_insert_to_mysql(self.pgs) self.conn.commit() self.cursor.close() self.conn.close()
阅读全文
0 0
- python爬虫<在Pipeline创建数据连接池批量写入MySQL>
- java连接mysql批量写入数据
- Python 在 MySQL 中读取、写入数据
- Python爬虫数据写入操作
- Python爬虫数据存储MySQL【1】连接方式
- Node.js在MySQL中存储数据 & 创建连接池
- spark 批量写入redis (pipeline + 分批提交)
- Java多线程B站爬虫与45万条视频数据,mysql批量写入性能对比,附代码
- redis批量上传数据(pipeline
- Python MySQL安装+Scrapy爬虫将Item写入mysql数据库
- Python 数据批量插入 MySQL
- Python 爬虫7——自定义Item Pipeline将数据存入MongoDB
- python爬虫--连接MongoDB 存数据
- Redis使用pipeline批量插入hash数据
- [Redis]RedisCluster使用pipeline批量读取数据
- Python网络数据采集-创建爬虫
- 用python将excel数据写入mysql
- Mysql 1秒200笔GPS数据批量写入优化
- 配置文件
- 这些行业将是最有发展前景的!
- 日常运维(二):监控io性能 free命令 ps命令 查看网络状态 linux下抓包
- CSS 文本样式[下]
- 我的第一篇博客
- python爬虫<在Pipeline创建数据连接池批量写入MySQL>
- SpringBoot+Thymeleaf+iBatis开发项目
- Linux系统根下目录的作用
- POJ1584-A Round Peg in a Ground Hole
- 多线程复制文件
- C#串口通信笔记(二)
- hbase 数据迁移,表count
- mysql连接同一列字段,连接同一行多个字段
- XSD 复合元素