BS4模块爬取第一财经练手

来源：互联网发布：天尚网络机顶盒编辑：程序博客网时间：2024/05/09 06:29

采用requests包和beautiful soup

注意requests get请求得到的html要encode为‘UTF-8’

得到的数据结果为新闻标题+url

# coding=utf-8from bs4 import BeautifulSoupimport requestsimport timeh1={    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}o_g=['113.200.159.155:9999']pro={'http': o_g[0],             }html=requests.get('http://www.yicai.com/news/cbndata/',timeout=20,headers=h1,proxies=pro)html.encoding = "utf-8"html = html.textsoup = BeautifulSoup(html,'lxml')con=soup.find_all('h3',class_="f-ff1 f-fwn f-fs22")for item in con:    print item.get_text()    print item.a.attrs['href']

阅读全文

0 0

BS4模块爬取第一财经练手
用 requests-bs4 爬取网络图片
第一财经
网页取数据bs4
利用bs4和requests爬取股票历史交易数据
用 requests 和 bs4 爬取世界大学排名数据
requests-re-bs4 定向爬取股票信息
Python网络爬虫requests、bs4爬取空姐网图片
使用BS4爬取 51job 一页的招聘信息
bs4+phantomjs爬取安居客二手房信息
python:使用requests,bs4爬取mmjpg上的图片
Python3 去掉bs4爬取信息中的‘\t’'\r' '\r'
R爬取新浪财经网的股票数据
爬取网易财经中股票的历史交易数据
python爬取新浪财经的股票信息
Python模块学习之bs4
第一财经周刊：走出唐家岭
Python爬取新浪英超曼联文章内页--bs4，json，txt和csv以及编码
R语言数据清洗与规整-回归模型为例
设计模式（十四）Singleton
2017-07-10(JS)
选择框的样式，
【CJOJ1494】【洛谷2756】飞行员配对方案问题
BS4模块爬取第一财经练手
JZOJ 7.9 B组第二题 blockenemy
mongo数据库上传多媒体文件（视频/音频）文件问题
JS中的!=、== 、!==、===的用法和区别
Android 使用ViewFlipper实现图片切换
单例模式
HTML知识总结3
SpringMvc快速使用
2017 ACM暑期特训