python3抓取代理服务器ip

来源:互联网 发布:淘宝店铺销量排行榜 编辑:程序博客网 时间:2024/06/05 21:09

在使用python3爬取网页,解析网页,然后结果入文件,入数据库

解析网页用到了BeautifulSoup,入库用到了pymsql

当然了这两个都是第三方的库,需要安装

具体代码如下:

#!/usr/bin/pythonimport urllib.requestimport pymysqlfrom bs4 import BeautifulSoupurl="http://proxy.com.ru"soup=BeautifulSoup(urllib.request.urlopen(url),from_encoding='utf-8')#print(soup)tables=soup.findAll('table')i=0j=0for table in tables:    if i==7:        print('开始抓取解析ip')        values=[]        f=open("ip.txt","w")        #print(table)        trs=table.findAll('tr')        for tr in trs:           if j>0:               tds=tr.findAll('td')               f.write(tds[1].text+":"+tds[2].text+"\n")               values.append(tds[1].text+":"+tds[2].text)           j=j+1        f.close()        #数据库操作        try:            conn=pymysql.connect(host='localhost',user='root',passwd='1234',db='test',charset='utf8')            cur=conn.cursor()            sql='delete from proxy_ip;insert into proxy_ip (ip) values'            dbparam=''            for param in values:                dbparam+="('"+param+"'),"            sql+=dbparam[:-1]            print(sql)            cur.execute(sql)            conn.commit()            cur.close()            conn.close()        except pymysql.Error as e:            print("pyMysql Error {0}".format(e))        break    i=i+1print("完成")


0 0
原创粉丝点击