爬虫第四课(RegEx爬取新闻网站)

来源：互联网发布：python numpy 编辑：程序博客网时间：2024/05/16 11:52

import requestsimport redef crawler163():    content = requests.get('http://www.163.com/').text    pattern1 = re.compile('<div class="tab_main clearfix".*?</ul>', re.S)    results_part = re.findall(pattern1, content)    pattern2 = re.compile('<li.*?href="(.*?)">(.*?)</a>', re.S)    results_filter = re.findall(pattern2,str(results_part))    for result in results_filter:        http,title = result        http = re.sub('\s', '', http)        title = re.sub('\s', '', title)        print(http,title)if __name__ == "__main__" :    crawler163()

用RegEx爬取新闻网站

阅读全文

0 0

爬虫第四课(RegEx爬取新闻网站)
第四课 Python爬虫简单爬取新浪新闻列表
Python爬虫爬取网站新闻
python3爬虫爬取图片，爬取新闻网站文章并保存到数据库
java网络爬虫爬取百度新闻
python爬虫爬取Bloomberg新闻
python爬虫之爬取腾讯新闻
新闻网站爬虫设计
9-某新闻网站爬取实战
python3爬取新闻网站的所有新闻-新手起步
（6）Python爬虫——爬取中新网新闻
爬虫（爬取36kr新闻）（未完成）
爬虫第三战 json爬取网易新闻
python爬虫爬取合肥工业大学校园新闻
python3爬虫-爬取新浪新闻首页所有新闻标题
[python爬虫]使用Python爬取网易新闻
使用python网络爬虫爬取新浪新闻（一）
爬虫-爬取网站上的图片
Windows下远程桌面无法连接
微信小程序--Tabs组件
java线程在项目中的应用场景
iOS本地化 NSLocalizedString的使用
关于React Native版本的降级
爬虫第四课(RegEx爬取新闻网站)
Java关键字final、static使用
【hdu 1087】 Super Jumping! Jumping! Jumping! （LIS变形）
动态规划之找零钱问题与背包问题
Delphi Dataset CurValue
VSTO Word2003 添加菜单栏, 添加工具栏
Verilog HDL程序设计——基本要素
2002-新老身份证号码比较
gsoap工具生成wsdl接口注意事项