我的Python爬虫(一)plus:Python数据库--mysql

来源:互联网 发布:ajax获取前台数据 编辑:程序博客网 时间:2024/05/29 09:34

修改内容:

(a)只搜索5~6位域名。

(b)把可用域名记入数据库

任务(a):

因为我不会多线程所以只做5位域名的。

很简单,把上一次的代码稍稍改动一下就好了,如下:

<span style="font-size:24px;">import re  import urllib  import urllib2  import cookielib    class ChDm_Spider:      def getpage(self,name,suffix='.com'):          data = {"d_name":"","dtype":"common","drand":".1416113688132"}          data["d_name"] = name+suffix          post_data = urllib.urlencode(data)          cj = cookielib.CookieJar          opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))          headers = {"User-agent":"Mozilla/5.0 (Windows NT 6.1; rv:32.0) Gecko/20100101 Firefox/32.0"}          req = urllib2.Request("http://www.zgsj.com/domain_reg/domaintrans.asp",post_data,headers)          content = urllib2.urlopen(req)          c=content.read()          pattern = re.compile('color:green;')          p = pattern.findall(c)          if p:              print name        def addname(self):          for n1 in range(97,123):              for n2 in range(97,123):                  for n3 in range(97,123):                      for n4 in range(97,123):                          for n5 in range(97,123):                              self.getpage(chr(n1)+chr(n2)+chr(n3)+chr(n4)+chr(n5))        myspider = ChDm_Spider()  myspider.addname()</span>


任务(b):

需要下载两个东西:MySQL以及Pyhton关于MySQL的模块MySQLdb
具体的安装过程在这里就不说了,我被这两个东西的安装过程折磨了快有一星期。其实在学习一门语言的过程中,烦的不是编程,而是这些软件的安装,到现在,我对python(x,y)和anaconda的使用还是不熟悉,尤其是anaconda的那个自带的包管理,到现在还没搞清楚头绪。
关于Python对数据库MySQL的操作,我主要参考的是这一篇博客。
http://blog.csdn.net/zm2714/article/details/7974890
这个程序中,主要用到的就是创建表以及插入数据。大体如下:
<span style="font-size:24px;">import os, sys, string  import MySQLdb    # 连接数据库  try:    conn = MySQLdb.connect(host='127.0.0.1',user='root',passwd='',db='test')  except Exception, e:    print e    sys.exit()    # 获取cursor对象来进行操作  cursor = conn.cursor()    # 创建表  sql = "create table if not exists test1(name varchar(128) primary key)"  cursor.execute(sql)    # 插入数据  sql = "insert into test1(name) values ('%s')" % ("aaaaa")  try:    cursor.execute(sql)  except Exception, e:    print e    sql = "insert into test1(name) values ('%s')" % ("bbbbb")  try:    cursor.execute(sql)  except Exception, e:    print e    # 插入多条  sql = "insert into test1(name) values (%s)"  val = (("ccccc"), ("ddddd"), ("eeeee"))  try:    cursor.executemany(sql, val)  except Exception, e:    print e    #查询出数据  sql = "select * from test1"  cursor.execute(sql)  alldata = cursor.fetchall()    # 如果有数据返回,就循环输出, alldata是有个二维的列表  if alldata:    for rec in alldata:      print rec[0]    cursor.close()  conn.close()  </span>


最终实现代码:

综合一下,差不多就是这个样子了,比较简单。

<span style="font-size:24px;">import os, sys, string  import MySQLdbimport re  import urllib  import urllib2  import cookielibval = list('0')del val[0]  class ChDm_Spider:      def getpage(self,name,suffix='.com'):          data = {"d_name":"","dtype":"common","drand":".1416113688132"}          data["d_name"] = name+suffix          post_data = urllib.urlencode(data)          cj = cookielib.CookieJar          opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))          headers = {"User-agent":"Mozilla/5.0 (Windows NT 6.1; rv:32.0) Gecko/20100101 Firefox/32.0"}          req = urllib2.Request("http://www.zgsj.com/domain_reg/domaintrans.asp",post_data,headers)          content = urllib2.urlopen(req)          c=content.read()          pattern = re.compile('color:green;')          p = pattern.findall(c)          if p:            val.append(name)      def addname(self):          for n1 in range(97,123):              for n2 in range(97,123):                  for n3 in range(97,123):                      for n4 in range(97,123):                          for n5 in range(97,123):                              self.getpage(chr(n1)+chr(n2)+chr(n3)+chr(n4)+chr(n5))        myspider = ChDm_Spider()  myspider.addname()  # 连接数据库  try:    conn = MySQLdb.connect(host='127.0.0.1',user='root',passwd='',db='test')  except Exception, e:    print e    sys.exit()    # 获取cursor对象来进行操作  cursor = conn.cursor()    # 创建表  sql = "create table if not exists test1(name varchar(128) primary key)"  cursor.execute(sql)    # 插入多条  sql = "insert into test1(name) values (%s)"try:    cursor.executemany(sql, val)  except Exception, e:    print e    #查询出数据  sql = "select * from test1"  cursor.execute(sql)  alldata = cursor.fetchall()    # 如果有数据返回,就循环输出, alldata是有个二维的列表  if alldata:    for rec in alldata:      print rec[0]    cursor.close()  conn.close()  </span>
这个运行起来真的很慢很慢慢~~~~

0 0
原创粉丝点击