[python爬虫]使用Python爬取网易新闻

来源：互联网发布：淘宝卖会员账号编辑：程序博客网时间：2024/06/05 16:15

分两步：

①爬取网易新闻标题和链接

②存入mysql中

上代码！

# -*- coding: utf-8 -*-"""Created on Thu Apr 06 17:04:37 2017@author: Administrator"""# -*- coding: utf-8 -*-"""Created on Thu Apr 06 15:00:19 2017@author: Administrator"""import pandas as pdfrom mysql import connectorimport urllib2import refrom bs4 import BeautifulSoupimport timeurl="http://www.163.com"html=urllib2.urlopen(url).read()soup=BeautifulSoup(html)links=soup.find_all(name="a",attrs={"href":re.compile("http://news")})z=[]for i in links:    z.append((i.get("href"),              i.get_text(),             time.strftime("%Y-%m-%d %X", time.localtime())))df=pd.DataFrame(z)df.columns=["news_url","news_title","record_time"] #df列名要和mysql中的字段名一致"""**************************分割线以下为mysql操作*********************************************""""""CREATE TABLE 'news2' ('news_url' TEXT,'news_title' VARCHAR(100) DEFAULT NULL,'record_time' DATETIME NOT NULL,'ID' INT(11) NOT NULL AUTO_INCREMENT,PRIMARY KEY ('ID'));ENGINE=InnoDB DEFAULT CHARSET=UTF8"""conn=connector.Connect(     host='localhost',     port = 3306,     user='root',     passwd='123',     db ='neteasynews',)cur=conn.cursor()df.to_sql("news",conn,flavor="mysql",if_exists="append")cur.close()

这几天一直没有找到好的办法使Python自动定时运行，更新后存入数据库。如果有好的办法请留言，谢谢。

0 0