我的Python爬虫（一）plus：Python数据库--mysql

来源：互联网发布：ajax获取前台数据编辑：程序博客网时间：2024/05/29 09:34

修改内容：

（a）只搜索5~6位域名。

（b）把可用域名记入数据库。

任务（a）：

因为我不会多线程所以只做5位域名的。

很简单，把上一次的代码稍稍改动一下就好了，如下：

<span style="font-size:24px;">import re  import urllib  import urllib2  import cookielib    class ChDm_Spider:      def getpage(self,name,suffix='.com'):          data = {"d_name":"","dtype":"common","drand":".1416113688132"}          data["d_name"] = name+suffix          post_data = urllib.urlencode(data)          cj = cookielib.CookieJar          opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))          headers = {"User-agent":"Mozilla/5.0 (Windows NT 6.1; rv:32.0) Gecko/20100101 Firefox/32.0"}          req = urllib2.Request("http://www.zgsj.com/domain_reg/domaintrans.asp",post_data,headers)          content = urllib2.urlopen(req)          c=content.read()          pattern = re.compile('color:green;')          p = pattern.findall(c)          if p:              print name        def addname(self):          for n1 in range(97,123):              for n2 in range(97,123):                  for n3 in range(97,123):                      for n4 in range(97,123):                          for n5 in range(97,123):                              self.getpage(chr(n1)+chr(n2)+chr(n3)+chr(n4)+chr(n5))        myspider = ChDm_Spider()  myspider.addname()</span>

任务（b）:

需要下载两个东西：MySQL以及Pyhton关于MySQL的模块MySQLdb

具体的安装过程在这里就不说了，我被这两个东西的安装过程折磨了快有一星期。其实在学习一门语言的过程中，烦的不是编程，而是这些软件的安装，到现在，我对python(x,y)和anaconda的使用还是不熟悉，尤其是anaconda的那个自带的包管理，到现在还没搞清楚头绪。

关于Python对数据库MySQL的操作，我主要参考的是这一篇博客。

http://blog.csdn.net/zm2714/article/details/7974890

这个程序中，主要用到的就是创建表以及插入数据。大体如下：

<span style="font-size:24px;">import os, sys, string  import MySQLdb    # 连接数据库  try:    conn = MySQLdb.connect(host='127.0.0.1',user='root',passwd='',db='test')  except Exception, e:    print e    sys.exit()    # 获取cursor对象来进行操作  cursor = conn.cursor()    # 创建表  sql = "create table if not exists test1(name varchar(128) primary key)"  cursor.execute(sql)    # 插入数据  sql = "insert into test1(name) values ('%s')" % ("aaaaa")  try:    cursor.execute(sql)  except Exception, e:    print e    sql = "insert into test1(name) values ('%s')" % ("bbbbb")  try:    cursor.execute(sql)  except Exception, e:    print e    # 插入多条  sql = "insert into test1(name) values (%s)"  val = (("ccccc"), ("ddddd"), ("eeeee"))  try:    cursor.executemany(sql, val)  except Exception, e:    print e    #查询出数据  sql = "select * from test1"  cursor.execute(sql)  alldata = cursor.fetchall()    # 如果有数据返回，就循环输出, alldata是有个二维的列表  if alldata:    for rec in alldata:      print rec[0]    cursor.close()  conn.close()  </span>

最终实现代码：

综合一下，差不多就是这个样子了，比较简单。

<span style="font-size:24px;">import os, sys, string  import MySQLdbimport re  import urllib  import urllib2  import cookielibval = list('0')del val[0]  class ChDm_Spider:      def getpage(self,name,suffix='.com'):          data = {"d_name":"","dtype":"common","drand":".1416113688132"}          data["d_name"] = name+suffix          post_data = urllib.urlencode(data)          cj = cookielib.CookieJar          opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))          headers = {"User-agent":"Mozilla/5.0 (Windows NT 6.1; rv:32.0) Gecko/20100101 Firefox/32.0"}          req = urllib2.Request("http://www.zgsj.com/domain_reg/domaintrans.asp",post_data,headers)          content = urllib2.urlopen(req)          c=content.read()          pattern = re.compile('color:green;')          p = pattern.findall(c)          if p:            val.append(name)      def addname(self):          for n1 in range(97,123):              for n2 in range(97,123):                  for n3 in range(97,123):                      for n4 in range(97,123):                          for n5 in range(97,123):                              self.getpage(chr(n1)+chr(n2)+chr(n3)+chr(n4)+chr(n5))        myspider = ChDm_Spider()  myspider.addname()  # 连接数据库  try:    conn = MySQLdb.connect(host='127.0.0.1',user='root',passwd='',db='test')  except Exception, e:    print e    sys.exit()    # 获取cursor对象来进行操作  cursor = conn.cursor()    # 创建表  sql = "create table if not exists test1(name varchar(128) primary key)"  cursor.execute(sql)    # 插入多条  sql = "insert into test1(name) values (%s)"try:    cursor.executemany(sql, val)  except Exception, e:    print e    #查询出数据  sql = "select * from test1"  cursor.execute(sql)  alldata = cursor.fetchall()    # 如果有数据返回，就循环输出, alldata是有个二维的列表  if alldata:    for rec in alldata:      print rec[0]    cursor.close()  conn.close()  </span>

这个运行起来真的很慢很慢慢~~~~

0 0