[python爬虫]使用Python爬取网易新闻

来源:互联网 发布:淘宝卖会员账号 编辑:程序博客网 时间:2024/06/05 16:15

分两步:

①爬取网易新闻标题和链接

②存入mysql中

上代码!

# -*- coding: utf-8 -*-"""Created on Thu Apr 06 17:04:37 2017@author: Administrator"""# -*- coding: utf-8 -*-"""Created on Thu Apr 06 15:00:19 2017@author: Administrator"""import pandas as pdfrom mysql import connectorimport urllib2import refrom bs4 import BeautifulSoupimport timeurl="http://www.163.com"html=urllib2.urlopen(url).read()soup=BeautifulSoup(html)links=soup.find_all(name="a",attrs={"href":re.compile("http://news")})z=[]for i in links:    z.append((i.get("href"),              i.get_text(),             time.strftime("%Y-%m-%d %X", time.localtime())))df=pd.DataFrame(z)df.columns=["news_url","news_title","record_time"] #df列名要和mysql中的字段名一致"""**************************分割线以下为mysql操作*********************************************""""""CREATE TABLE 'news2' ('news_url' TEXT,'news_title' VARCHAR(100) DEFAULT NULL,'record_time' DATETIME NOT NULL,'ID' INT(11) NOT NULL AUTO_INCREMENT,PRIMARY KEY ('ID'));ENGINE=InnoDB DEFAULT CHARSET=UTF8"""conn=connector.Connect(     host='localhost',     port = 3306,     user='root',     passwd='123',     db ='neteasynews',)cur=conn.cursor()df.to_sql("news",conn,flavor="mysql",if_exists="append")cur.close()
这几天一直没有找到好的办法使Python自动定时运行,更新后存入数据库。如果有好的办法请留言,谢谢。

0 0
原创粉丝点击