用python来为自己办事-抓取网页内容
来源:互联网 发布:阿里云 直播 编辑:程序博客网 时间:2024/05/21 06:36
import sys,urllib
url="http://www.putclub.com/html/radio/VOA/presidentspeech/index.html"
wp = urllib.urlopen(url)
print "start download..."
content = wp.read()
print content.count("center_box")
index = content.find("center_box")
content=content[content.find("center_box")+1:]
content=content[content.find("href=")+7:content.find("target")-2]
filename = content
url ="http://www.putclub.com/"+content
print content
print url
wp = urllib.urlopen(url)
print "start download..."
content = wp.read()
#print content
print content.count("<div class=\"content\"")
#content = content[content.find("<div class=\"content\""):]
content = content[content.find("<!--info end------->"):]
content = content[:content.find("<div class=\"dede_pages\"")-1]
filename = filename[filename.find("presidentspeech")+len("presidentspeech/"):]
filename = filename.replace('/',"-",filename.count("/"))
fp = open(filename,"w+")
fp.write(content)
fp.close()
print content
url="http://www.putclub.com/html/radio/VOA/presidentspeech/index.html"
wp = urllib.urlopen(url)
print "start download..."
content = wp.read()
print content.count("center_box")
index = content.find("center_box")
content=content[content.find("center_box")+1:]
content=content[content.find("href=")+7:content.find("target")-2]
filename = content
url ="http://www.putclub.com/"+content
print content
print url
wp = urllib.urlopen(url)
print "start download..."
content = wp.read()
#print content
print content.count("<div class=\"content\"")
#content = content[content.find("<div class=\"content\""):]
content = content[content.find("<!--info end------->"):]
content = content[:content.find("<div class=\"dede_pages\"")-1]
filename = filename[filename.find("presidentspeech")+len("presidentspeech/"):]
filename = filename.replace('/',"-",filename.count("/"))
fp = open(filename,"w+")
fp.write(content)
fp.close()
print content
0 0
- 用python来为自己办事-抓取网页内容
- python抓取网页内容
- python抓取网页内容
- python 网页内容抓取
- Python抓取网页内容
- python 抓取网页内容
- Python抓取网页内容
- [python]抓取网页的内容
- python 抓取网页内容教程
- Python 3来抓取网页
- 用Python的Lxml库抓取网页内容
- 用python模拟一个文本浏览器来抓取网页
- paip.抓取网页内容--java php python
- python beautifulsoup 抓取网页正文内容
- Python使用代理抓取网页内容
- 【python】网页内容抓取遭遇乱码问题
- Python抓取one网页上的内容
- Python简单抓取在线网页内容
- Oracle Net Manager配置本地数据库连接,测试时卡死解决方法之一
- Java 多线程中Condition的使用
- BroadcastReceiver动态注册和静态注册哪个先执行
- 【leetcode】 Permutation Sequence
- C++类和对象的继承和派生
- 用python来为自己办事-抓取网页内容
- 掌握C++运算符重载
- 创建Python程序
- Java作用域public、protected、default、private
- Jump Game
- 【简单题】-POJ-3802-Cubist Artwork
- hello hibernate
- oracle rac常用命令
- 掌握C++类和对象的模板